This document is updated from
analyze_temp_serial_transfer_expt--28Oct24.Rmd.
There are 5 treatments: no heat (5 days serial transfer), 6h heat, 12h heat, 24h heat, and 48h heat. Each of these is setup with 5 technical replicates that was initially inoculated to about equal ratios (on Day 0).
Summary of choices: Based on the number of cells observed in true blank wells, I only include data from wells with >50 cells. Based on the flow cytometry data from Day 0 (which is estimated from the blanks in the OD data to have never experienced any contamination events), a misclassification rate of 1% is assumed. Contaminated replicates are defined as having a substantially higher % of a species that was not inoculated in that well (i.e., than expected from this misclassification rate). When a contaminated well was detected, all time points associated with that well were removed from the data.
After loading the environment, I will load all of the flow cytometry cell count data and information about extinct wells from the OD data. Data is also annotated.
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(readxl) # for importing data directly from Excel sheet
library(RColorBrewer) # for changing the colours of plots
library(ggbeeswarm) # for beeswarm plots
library(vegan) # to estimate diversity and for ordination (NMDS)
## Loading required package: permute
## Loading required package: lattice
## This is vegan 2.6-8
library(ggordiplots) # for ggplotting ellipses around treatment group centroids during ordination
## Loading required package: glue
library(chemodiv) # for estimating species richness
#library(lme4) <-- not sure this is needed?
library(glmmTMB) # for fitting and trouble-shooting GLM's
library(DHARMa) # for plotting the residuals when using glmmTMB
## This is DHARMa 0.4.7. For overview type '?DHARMa'. For recent changes, type news(package = 'DHARMa')
library(rcompanion) # for r-squared estimates of GLM's
library(MuMIn) # for calculating AICc
library(performance) # for checking multicolinearity
library(effsize) # for post-hoc estimate of effect sizes
library(emmeans) # (ditto as above)
## Welcome to emmeans.
## Caution: You lose important information if you filter this package's results.
## See '? untidy'
library(BSDA) # for pairwise t-tests to compare effect sizes between data subsets
##
## Attaching package: 'BSDA'
##
## The following object is masked from 'package:datasets':
##
## Orange
#library(partitionBEFsp) # for paritioning the biodiversity effects
#library(ape) # for ordination ??
library(ggforce) # for plotting ellipses in ggplot
# print the complete info about packages and versions currently loaded in the environment:
sessionInfo()
## R version 4.4.2 (2024-10-31 ucrt)
## Platform: x86_64-w64-mingw32/x64
## Running under: Windows 11 x64 (build 22631)
##
## Matrix products: default
##
##
## locale:
## [1] LC_COLLATE=English_United Kingdom.utf8
## [2] LC_CTYPE=English_United Kingdom.utf8
## [3] LC_MONETARY=English_United Kingdom.utf8
## [4] LC_NUMERIC=C
## [5] LC_TIME=English_United Kingdom.utf8
##
## time zone: Europe/Paris
## tzcode source: internal
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] ggforce_0.4.2 BSDA_1.2.2 emmeans_1.10.5 effsize_0.8.1
## [5] performance_0.12.4 MuMIn_1.48.4 rcompanion_2.4.36 DHARMa_0.4.7
## [9] glmmTMB_1.1.11 chemodiv_0.3.0 ggordiplots_0.4.3 glue_1.8.0
## [13] vegan_2.6-8 lattice_0.22-6 permute_0.9-7 ggbeeswarm_0.7.2
## [17] RColorBrewer_1.1-3 readxl_1.4.3 lubridate_1.9.3 forcats_1.0.0
## [21] stringr_1.5.1 dplyr_1.1.4 purrr_1.0.2 readr_2.1.5
## [25] tidyr_1.3.1 tibble_3.2.1 ggplot2_3.5.1 tidyverse_2.0.0
##
## loaded via a namespace (and not attached):
## [1] Rdpack_2.6.2 gld_2.6.6 sandwich_3.1-1
## [4] rlang_1.1.4 magrittr_2.0.3 multcomp_1.4-26
## [7] matrixStats_1.4.1 e1071_1.7-16 compiler_4.4.2
## [10] mgcv_1.9-1 vctrs_0.6.5 pkgconfig_2.0.3
## [13] fastmap_1.2.0 rmarkdown_2.29 tzdb_0.4.0
## [16] haven_2.5.4 nloptr_2.1.1 xfun_0.49
## [19] modeltools_0.2-23 cachem_1.1.0 jsonlite_1.8.9
## [22] tweenr_2.0.3 parallel_4.4.2 cluster_2.1.6
## [25] DescTools_0.99.58 R6_2.5.1 coin_1.4-3
## [28] bslib_0.8.0 stringi_1.8.4 boot_1.3-31
## [31] lmtest_0.9-40 jquerylib_0.1.4 cellranger_1.1.0
## [34] numDeriv_2016.8-1.1 estimability_1.5.1 Rcpp_1.0.13-1
## [37] knitr_1.49 zoo_1.8-12 Matrix_1.7-1
## [40] splines_4.4.2 timechange_0.3.0 tidyselect_1.2.1
## [43] rstudioapi_0.17.1 yaml_2.3.10 TMB_1.9.15
## [46] codetools_0.2-20 plyr_1.8.9 withr_3.0.2
## [49] coda_0.19-4.1 evaluate_1.0.1 survival_3.7-0
## [52] polyclip_1.10-7 proxy_0.4-27 pillar_1.10.1
## [55] nortest_1.0-4 stats4_4.4.2 insight_1.0.0
## [58] reformulas_0.4.0 generics_0.1.3 hms_1.1.3
## [61] munsell_0.5.1 scales_1.3.0 rootSolve_1.8.2.4
## [64] minqa_1.2.8 xtable_1.8-4 class_7.3-22
## [67] lmom_3.2 tools_4.4.2 data.table_1.16.2
## [70] lme4_1.1-35.5 Exact_3.3 mvtnorm_1.3-2
## [73] grid_4.4.2 rbibutils_2.3 libcoin_1.0-10
## [76] colorspace_2.1-1 nlme_3.1-166 beeswarm_0.4.0
## [79] vipor_0.4.7 cli_3.6.3 expm_1.0-0
## [82] gtable_0.3.6 sass_0.4.9 digest_0.6.37
## [85] TH.data_1.1-2 farver_2.1.2 htmltools_0.5.8.1
## [88] lifecycle_1.0.4 httr_1.4.7 multcompView_0.1-10
## [91] MASS_7.3-61
# set theme for all plots
fave_theme <- theme_light() + # see other options at https://ggplot2.tidyverse.org/reference/ggtheme.html
theme(text = element_text(size=15), # larger text size for titles & axes
panel.grid.major = element_blank(), # remove major gridlines
panel.grid.minor = element_blank()) # remove minor gridlines
theme_set(fave_theme)
# define a palette for plotting the 4 species
species_4pal_alphabetical = palette.colors(8, palette = "R4")[c(3, 5, 7, 2)] #in alphabetical order
species_4pal_speed = palette.colors(8, palette = "R4")[c(7, 5, 3, 2)] #from fast to slow
# define a palette for plotting the 3 treatment days
trtmt_pal = brewer.pal(4, "Set2")[c(4, 3, 1)]
# define a palette for plotting the heat duration
control_to_48h_pal <- scale_colour_viridis_d(option = "plasma", begin=0.05, end = 0.9)
# define a palette for plotting the inoculated community richness
CommRich_pal <- scale_colour_viridis_d(option = "viridis", begin=0.2, end=0.95)
# define a function to find the mode of a vector. Credit to https://stackoverflow.com/questions/2547402/how-to-find-the-statistical-mode
Mode <- function(x) {
ux <- unique(x)
ux[which.max(tabulate(match(x, ux)))]
}
Load the cell counts data from the summary Excel files created by FCS Express. Then load the well volume data from the summary .csv files created by the Attune software.
# a function to load the fluorescent counts data (from .xlsx files created by )
import_flow_counts <- function(file)
return(as.data.frame(
read_excel(path=file, sheet="FCS Express Report",
# replace the column names as follows:
col_names = c("Filename",
"Gate1", "Count_grimontii",
"Gate2", "Count_putida",
"Gate3", "Count_protegens",
"Gate4","Count_veronii")))
)
# a function to load and parse the volume data
import_flow_volume <- function(file) {
raw.csv <- read.csv(file)
# keep the volume info and just enough data to identify the sample. Then remove resultant redundant rows
vol_data <- raw.csv %>% select(Plate, Sample, Volume) %>% unique()
}
# a function to loop through the folders containing the data files, open the .xlsx and .csv files and combine their data
import_from_files <- function(dir_vector){
# initiatize variables
raw_counts <- raw_vols <- data.frame()
# loop through each directory
for(dir in dir_vector){
# get all the file names
files_v <- list.files(dir)
# identify the excel files
files_excel <- files_v[endsWith(files_v, ".xlsx")]
# and loop through all of them to extract their data
TMPraw_counts <- data.frame()
for(val in files_excel){
TMPraw_counts <- rbind(TMPraw_counts, import_flow_counts(paste0(dir, "/", val)))
}
# identify the csv files
files_csv <- files_v[endsWith(files_v, ".csv")]
# and loop through all of them to extract their data
TMPraw_vols <- data.frame()
for(val in files_csv){
TMPraw_vols <- rbind(TMPraw_vols, import_flow_volume(paste0(dir, "/", val)))
}
# concatenate the data from counts and from vols
raw_counts <- rbind(raw_counts, TMPraw_counts)
raw_vols <- rbind(raw_vols, TMPraw_vols)
rm(TMPraw_counts, TMPraw_vols)
}
return(list(raw_counts, raw_vols))
}
# get all of the raw data:
list_rawdata <- import_from_files(c("./raw_data/serial_transf--2July24", "./raw_data/serial_transf--8July24", "./raw_data/serial_transf--5Aug24", "./raw_data/serial_transf--19Aug24"))
Now we can process the data to create unique ID’s for each sample. This info needs to be parsed from the Filename column for the flow counts data (i.e., excel files) and from the Plate column for the flow volumes data (i.e., csv files).
# start with flow counts data:
# I got confused and now there are rows containing the column names. Get rid of those...
list_rawdata[[1]] <- list_rawdata[[1]][-grep("Filename", list_rawdata[[1]]$Filename),]
# Day0 has a different pattern in the Filename column so let's process those rows first
Day0 <- list_rawdata[[1]][grep("Day0", list_rawdata[[1]]$Filename),] %>% separate_wider_regex(Filename,
c(Date="24-0\\d-\\d{2}", " Day", Day="\\d",
".*dilution_", Well="\\w\\d+", "\\.acs compensated"))
# Now process the Filename column for the other days
NOTday0 <- list_rawdata[[1]][-grep("Day0", list_rawdata[[1]]$Filename),] %>% separate_wider_regex(Filename,
c(Date="24-0\\d-\\d{2}", " Day", Day="\\d", " -- ",
Incubator="\\w+", "\\.plate", Plate="\\d",
".*dilution_", Well="\\w\\d+", "\\.acs compensated"))
# Put the flow counts data back together into a single data.frame:
raw_Counts <- rbind(Day0 %>% mutate(Incubator=NA, Plate=0), # add in the 2 extra empty columns that are missing from Day0
NOTday0) %>% select(-Gate1, -Gate2, -Gate3, -Gate4)
rm(Day0, NOTday0)
# then do a similar thing for the volume data:
# Day0 has a different pattern in the Plate column so let's process those rows first
Day0 <- list_rawdata[[2]][grep("Day0", list_rawdata[[2]]$Plate),] %>% separate_wider_regex(Plate,
c(Date="24-0\\d-\\d{2}", " Day", Day="\\d", ".*"))
# Now process the Plate column for the other days
NOTday0 <- list_rawdata[[2]][-grep("Day0", list_rawdata[[2]]$Plate),] %>% separate_wider_regex(Plate,
c(Date="24-0\\d-\\d{2}", " Day", Day="\\d", " -- ",
Incubator="\\w+", "\\.plate", Plate="\\d"))
# Put the flow volumes data back together into a single data.frame:
raw_Vol <- rbind(Day0 %>% mutate(Incubator=NA, Plate=0), # add in the 2 extra empty columns that are missing from Day0
NOTday0) %>% rename(Well = Sample) # rename this column for consistency with the Counts data
rm(Day0, NOTday0)
# We can now combine the counts and volume data
# here I need to use left join because we don't have volume data for Day 0 on 24-07-02 !!!!!!
raw_data <- left_join(raw_Counts, raw_Vol,
by=c("Date", "Day", "Well", "Incubator", "Plate"))
rm(raw_Counts, raw_Vol)
# add annotation specifying the Heat treatment and the Incubator
# For 2July24: all samples were subjected to 6h of heat
# For 8July24: samples in the Epoch plate reader are control (no heat)
# Samples in the H1 plate reader are 48h of heat
# For 5Aug24: all samples were subjected to 12h of heat
# For 19Aug24: all samples were subjected to 24h of heat
raw_data$Heat <- 0
raw_data$Heat[which(raw_data$Date == "24-07-02")] <- 6
raw_data$Heat[which(raw_data$Date == "24-07-08" & raw_data$Incubator == "H1")] <- 48
raw_data$Heat[which(raw_data$Date == "24-08-05")] <- 12
raw_data$Heat[which(raw_data$Date == "24-08-19")] <- 24
# change the variable classes for data analysis
raw_data$Count_grimontii <- as.numeric(raw_data$Count_grimontii)
raw_data$Count_putida <- as.numeric(raw_data$Count_putida)
raw_data$Count_protegens <- as.numeric(raw_data$Count_protegens)
raw_data$Count_veronii <- as.numeric(raw_data$Count_veronii)
Finally, we can annotate the data with the sample information for each well. Note that there are different plate layouts for Day0 (same for all dates) And the experiment from 24-07-02 uses a different layout as compared to the rest of the data (see the layout png file in the corresponding data subfolder). …But, also, I made other mistakes too so there’s modified layouts for that too! XP
# the "Plate1" layout is used for all days >0 (except for 24-07-02)
layout.plate1 <- data.frame(Well = paste0(LETTERS[1:8], rep((2*1:6)-1, each=8)),
putida = c(0, 1, 1, 1, 0, 0, 0, 0,
1, 0, 0, 0, 1, 1, 1, 0,
0, 0, 1, 1, 1, 0, 1, 0,
0, 0, 1, 1, 1, 0, 1, 0,
1, 1, 1, 0, 1, 0, 0, 0,
0, rep(0,6), 1),
protegens = c(0, 0, 0, 1, 0, 0, 1, 0,
0, 1, 0, 0, 1, 0, 0, 1,
1, 0, 1, 1, 0, 1, 1, 0,
0, 0, 1, 0, 0, 1, 0, 1,
1, 1, 0, 1, 1, 0, 1, 0,
1, rep(0,6), 0),
grimontii = c(0, 0, 1, 0, 0, 1, 0, 0,
0, 0, 1, 0, 0, 1, 0, 1,
0, 1, 1, 0, 1, 1, 1, 0,
1, 0, 0, 1, 0, 1, 0, 0,
1, 0, 1, 1, 1, 0, 0, 1,
1, rep(0,6), 0),
veronii = c(0, 1, 0, 0, 1, 0, 0, 0,
0, 0, 0, 1, 0, 0, 1, 0,
1, 1, 0, 1, 1, 1, 1, 0,
0, 1, 0, 0, 1, 0, 0, 0,
0, 1, 1, 1, 1, 0, 1, 1,
0, rep(0,6), 0))
### CommRich = 0 corresponds to blanks, mistakes made on Day0 are removed altogether,
### and CommRich = NA is used to indicate contamination.
# modified layout of plate1 specific for 24-07-02
layout.plate1_2Jul <- layout.plate1
layout.plate1_2Jul$putida[c(1,8, 41:48)] <- c(0, 1, 1, 1, 1, 0, 1, 0, 0, 0)
layout.plate1_2Jul$protegens[c(1,8, 41:48)] <- c(1, 0, 1, 0, 0, 1, 0, 1, 0, 0)
layout.plate1_2Jul$grimontii[c(1,8, 41:48)] <- c(1, 0, 0, 1, 0, 1, 0, 0, 1, 0)
layout.plate1_2Jul$veronii[c(1,8, 41:48)] <- c(0, 0, 0, 0, 1, 0, 0, 0, 0, 1)
# modified layout of plate1 specific for mistakes made on 24-07-08
# column 4 of OD plate is swapped orientation
layout.plate1_8Jul <- layout.plate1
layout.plate1_8Jul$putida[25:32] <- layout.plate1$putida[9:16]
layout.plate1_8Jul$protegens[25:32] <- layout.plate1$protegens[9:16]
layout.plate1_8Jul$grimontii[25:32] <- layout.plate1$grimontii[9:16]
layout.plate1_8Jul$veronii[25:32] <- layout.plate1$veronii[9:16]
# add a column for community richness in all of the above df's
layout.plate1 <- layout.plate1 %>% mutate(CommRich = putida+protegens+grimontii+veronii , .keep="all")
layout.plate1_2Jul <- layout.plate1_2Jul %>% mutate(CommRich = putida+protegens+grimontii+veronii , .keep="all")
layout.plate1_8Jul <- layout.plate1_8Jul %>% mutate(CommRich = putida+protegens+grimontii+veronii , .keep="all")
# the "Plate2" layout
layout.plate2 <- data.frame(Well = paste0(LETTERS[1:8], rep((2*1:6)-1, each=8)),
putida = c(1, 1, 1, 0, 1, 0, 0, 0,
1, 0, 1, 0, 0, 0, 1, 1,
1, 0, 1, 0, 0, 0, 1, 1,
1, 0, 0, 0, 1, 1, 1, 0,
rep(0,7), 0,
0, 1, 0, 1, 1, 1, 0, 0),
protegens = c(1, 0, 0, 1, 0, 1, 0, 0,
0, 1, 1, 0, 1, 0, 1, 1,
0, 1, 0, 1, 0, 0, 1, 0,
1, 0, 1, 0, 1, 1, 0, 1,
rep(0,7), 1,
0, 1, 1, 0, 1, 1, 0, 0),
grimontii = c(0, 1, 0, 1, 0, 0, 1, 0,
1, 1, 1, 0, 0, 1, 1, 0,
0, 1, 0, 0, 1, 0, 0, 1,
1, 0, 0, 1, 1, 0, 1, 1,
rep(0,7), 0,
0, 1, 1, 1, 0, 1, 1, 0),
veronii = c(0, 0, 1, 0, 0, 0, 0, 1,
1, 1, 1, 0, 1, 1, 0, 1,
1, 0, 0, 0, 0, 1, 0, 0,
1, 0, 1, 1, 0, 1, 1, 1,
rep(0,7), 1,
0, 1, 1, 1, 1, 0, 1, 0))
# modified layout of plate2 specific for 24-07-02
layout.plate2_2Jul <- layout.plate2
layout.plate2_2Jul$putida[1:32] <- layout.plate2$putida[c(9:32,41:47,40)]
layout.plate2_2Jul$protegens[1:32] <- layout.plate2$protegens[c(9:32,41:47,40)]
layout.plate2_2Jul$grimontii[1:32] <- layout.plate2$grimontii[c(9:32,41:47,40)]
layout.plate2_2Jul$veronii[1:32] <- layout.plate2$veronii[c(9:32,41:47,40)]
layout.plate2_2Jul <- layout.plate2_2Jul[1:32,] # rest of flow plate 2 is empty
# modified layout of plate2 specific for mistakes made on 24-07-08 and 24-08-19
layout.plate2_8Jul19Aug <- layout.plate2[-(9:16),] # I screwed up column 8 of OD plate
# add a column for community richness in all of the above df's
layout.plate2 <- layout.plate2 %>% mutate(CommRich = putida+protegens+grimontii+veronii , .keep="all")
layout.plate2_2Jul <- layout.plate2_2Jul %>% mutate(CommRich = putida+protegens+grimontii+veronii , .keep="all")
layout.plate2_8Jul19Aug <- layout.plate2_8Jul19Aug %>% mutate(CommRich = putida+protegens+grimontii+veronii , .keep="all")
# the "Inocula" layout
layout.inocula <- data.frame(Well = paste0(LETTERS[1:8], rep((2*1:6)-1, each=8)),
putida = rep(c(1, 0, 0, 0, 1, 1, 1, 0,
0, 0, 1, 1, 1, 0, 1, NA), times=3),
protegens = rep(c(0, 1, 0, 0, 1, 0, 0, 1,
1, 0, 1, 1, 0, 1, 1, NA), times=3),
grimontii = rep(c(0, 0, 1, 0, 0, 1, 0, 1,
0, 1, 1, 0, 1, 1, 1, NA), times=3),
veronii = rep(c(0, 0, 0, 1, 0, 0, 1, 0,
1, 1, 0, 1, 1, 1, 1, NA), times=3)) %>%
mutate(CommRich = putida+protegens+grimontii+veronii , .keep="all") %>%
filter(!is.na(CommRich))
# a function to annotate each data set with the indicated layout
# this will KEEP well blanks!
annotate_samples <- function(layout, select_date, select_plate) {
relevant_data <- raw_data %>% filter(Date==select_date, Plate==select_plate)
# for Innoc, use inner_join to combine the flow data with its annotation
if(select_plate == 0){
output_df <- inner_join(layout, relevant_data, by="Well")
}
if(select_plate != 0) {
output_df <- left_join(merge(layout, relevant_data %>% select(Day, Incubator, Heat) %>% distinct()),
relevant_data, by=c("Well", "Day", "Incubator", "Heat"))
output_df$Date <- select_date
output_df$Plate <- select_plate
}
return(output_df)
rm(relevant_data, output_df)#, blank_annot, blank_data)
}
# now we can add the sample names for each one.
annotated.rawdata <- rbind(annotate_samples(layout = layout.inocula, select_date = "24-07-02", select_plate=0),
annotate_samples(layout = layout.plate1_2Jul, select_date = "24-07-02", select_plate=1),
annotate_samples(layout = layout.plate2_2Jul, select_date = "24-07-02", select_plate=2),
annotate_samples(layout = layout.inocula, select_date = "24-07-08", select_plate=0),
annotate_samples(layout = layout.plate1_8Jul, select_date = "24-07-08", select_plate=1),
annotate_samples(layout = layout.plate2_8Jul19Aug, select_date = "24-07-08", select_plate=2), ##
annotate_samples(layout = layout.inocula, select_date = "24-08-05", select_plate=0),
annotate_samples(layout = layout.plate1, select_date = "24-08-05", select_plate=1),
annotate_samples(layout = layout.plate2, select_date = "24-08-05", select_plate=2),
annotate_samples(layout = layout.inocula, select_date = "24-08-19", select_plate=0),
annotate_samples(layout = layout.plate1, select_date = "24-08-19", select_plate=1),
annotate_samples(layout = layout.plate2_8Jul19Aug, select_date = "24-08-19", select_plate=2))
# fixing other small mistakes in annotation:
# Day1 of 24-07-02: sample A1 from plate 2 was loaded into sample A1 plate 1.
annotated.rawdata$CommRich[which(annotated.rawdata$Date=="24-07-02" & annotated.rawdata$Day=="1" &
annotated.rawdata$Well=="A1" & annotated.rawdata$Plate=="1")] <- 3
annotated.rawdata$putida[which(annotated.rawdata$Date=="24-07-02" & annotated.rawdata$Day=="1" &
annotated.rawdata$Well=="A1" & annotated.rawdata$Plate=="1")] <- 1
annotated.rawdata$protegens[which(annotated.rawdata$Date=="24-07-02" & annotated.rawdata$Day=="1" &
annotated.rawdata$Well=="A1" & annotated.rawdata$Plate=="1")] <- 0
annotated.rawdata$grimontii[which(annotated.rawdata$Date=="24-07-02" & annotated.rawdata$Day=="1" &
annotated.rawdata$Well=="A1" & annotated.rawdata$Plate=="1")] <- 1
annotated.rawdata$veronii[which(annotated.rawdata$Date=="24-07-02" & annotated.rawdata$Day=="1" &
annotated.rawdata$Well=="A1" & annotated.rawdata$Plate=="1")] <- 1
# Annotate the treatments
annotated.rawdata$Heat_Day <- as.numeric(NA)
annotated.rawdata$Heat_Day[which(annotated.rawdata$Heat!=0 & annotated.rawdata$Day==1)] <- 1
annotated.rawdata$Heat_Day[which(annotated.rawdata$Heat>6 & annotated.rawdata$Day==2)] <- 2
annotated.rawdata$Heat_Day[which(annotated.rawdata$Heat==48 & annotated.rawdata$Day==3)] <- 3
annotated.rawdata$Recov_Day <- as.numeric(NA)
annotated.rawdata$Recov_Day[which(annotated.rawdata$Heat==6 & annotated.rawdata$Day==2)] <- 1
annotated.rawdata$Recov_Day[which(annotated.rawdata$Heat==6 & annotated.rawdata$Day==3)] <- 2
annotated.rawdata$Recov_Day[which(annotated.rawdata$Heat %in% c(12,24) & annotated.rawdata$Day==3)] <- 1
annotated.rawdata$Recov_Day[which(annotated.rawdata$Heat %in% c(12,24) & annotated.rawdata$Day==4)] <- 2
annotated.rawdata$Recov_Day[which(annotated.rawdata$Heat==48 & annotated.rawdata$Day==4)] <- 1
annotated.rawdata$Recov_Day[which(annotated.rawdata$Heat==48 & annotated.rawdata$Day==5)] <- 2
# sanity check to make sure there are no redundant rows
stopifnot(!any(duplicated(annotated.rawdata %>% select(Date, Day, Incubator, Plate, Well))))
# change some of the values to more appropriate types
annotated.rawdata$Plate <- as.numeric(annotated.rawdata$Plate)
annotated.rawdata$Day <- as.numeric(annotated.rawdata$Day)
# clean up
rm(layout.inocula, layout.plate1, layout.plate1_2Jul, layout.plate1_8Jul, layout.plate2, layout.plate2_2Jul, layout.plate2_8Jul19Aug, list_rawdata, raw_data)
The annotated data contains information on the complete dataset,
including blank wells and excluded wells. Any mistakes there were made
during inoculation on Day 0 have been removed altogether.
CommRich == 0 indicates well blanks that should be empty
(in this case, all 4 species columns will also be 0).
Finally, CommRich == NA indicates data rows that were
excluded; e.g., due to low total counts or contamination (in this case,
the 4 species columns will be kept to indicate what should have been in
that excluded well).
For reproducibility and checking that the metadata is correctly
associated with the data, print the metadata out to file. Note that the
location on the incubated plates (corresponding to OD data) is different
from the location on the flow cytometery plate. In the code below I
create a column for the OD_Well and assign unique identifiers for each
time series. The metadata file annotation_for_alldata.csv
summarizes all of this info.
# annotation for Day 0 lists the plate as plate 0 but let's change that to Innoc
annotated.rawdata$Plate[which(annotated.rawdata$Plate==0)] <- "Innoc"
# copy the metadata to another variable and remove the data columns
metadata <- annotated.rawdata %>% select(-Volume,
-Count_grimontii, -Count_protegens, -Count_putida, -Count_veronii)
# the columns currently labeled as "Well" and "plate" is actually only true for the location of the sample on the flow cytometer data
metadata$filler <- "plate"
metadata <- metadata %>% unite(col="plateNum", c(filler, Plate), sep="", remove = FALSE) %>%
unite(col="FLOWplateWell", c(plateNum, Well), sep="-", remove = FALSE) %>% select(-filler, -plateNum)
#####
# add true well sample location to metadata (i.e., as corresponding to OD data)
#####
# split up the Well into separate columns for the row and column location
metadata <- metadata %>% separate_wider_regex(Well, c(row="\\w", col="\\d+"))
metadata$REALcol <- 0
# for non-Innoc days after 2 July, the pattern is actually very simple and systematic
metadata$REALcol[which(metadata$Plate==1 & metadata$Date > "24-07-02" & metadata$col==1)] <- 1
metadata$REALcol[which(metadata$Plate==1 & metadata$Date > "24-07-02" & metadata$col==3)] <- 2
metadata$REALcol[which(metadata$Plate==1 & metadata$Date > "24-07-02" & metadata$col==5)] <- 3
metadata$REALcol[which(metadata$Plate==1 & metadata$Date > "24-07-02" & metadata$col==7)] <- 4
metadata$REALcol[which(metadata$Plate==1 & metadata$Date > "24-07-02" & metadata$col==9)] <- 5
metadata$REALcol[which(metadata$Plate==1 & metadata$Date > "24-07-02" & metadata$col==11)] <- 6
metadata$REALcol[which(metadata$Plate==2 & metadata$Date > "24-07-02" & metadata$col==1)] <- 7
metadata$REALcol[which(metadata$Plate==2 & metadata$Date > "24-07-02" & metadata$col==3)] <- 8
metadata$REALcol[which(metadata$Plate==2 & metadata$Date > "24-07-02" & metadata$col==5)] <- 9
metadata$REALcol[which(metadata$Plate==2 & metadata$Date > "24-07-02" & metadata$col==7)] <- 10
metadata$REALcol[which(metadata$Plate==2 & metadata$Date > "24-07-02" & metadata$col==9)] <- 11
metadata$REALcol[which(metadata$Plate==2 & metadata$Date > "24-07-02" & metadata$col==11)] <- 12
# for non-Innoc days on 2 July, the pattern is similar for plate 1 columns 1 to 9:
metadata$REALcol[which(metadata$Plate==1 & metadata$Date == "24-07-02" & metadata$col==1)] <- 1
metadata$REALcol[which(metadata$Plate==1 & metadata$Date == "24-07-02" & metadata$col==3)] <- 2
metadata$REALcol[which(metadata$Plate==1 & metadata$Date == "24-07-02" & metadata$col==5)] <- 3
metadata$REALcol[which(metadata$Plate==1 & metadata$Date == "24-07-02" & metadata$col==7)] <- 4
metadata$REALcol[which(metadata$Plate==1 & metadata$Date == "24-07-02" & metadata$col==9)] <- 5
# the pattern changes from here:
metadata$REALcol[which(metadata$Plate==1 & metadata$Date == "24-07-02" & metadata$col==11)] <- 7
metadata$REALcol[which(metadata$Plate==2 & metadata$Date == "24-07-02" & metadata$col==1)] <- 8
metadata$REALcol[which(metadata$Plate==2 & metadata$Date == "24-07-02" & metadata$col==3)] <- 9
metadata$REALcol[which(metadata$Plate==2 & metadata$Date == "24-07-02" & metadata$col==5)] <- 10
metadata$REALcol[which(metadata$Plate==2 & metadata$Date == "24-07-02" & metadata$col==7)] <- 12
# Note that plate 2 Well H7 on flow actually comes from H11
metadata$REALcol[which(metadata$Plate==2 & metadata$Date == "24-07-02" & metadata$col==7 & metadata$row=="H")] <- 11
# and for plate 1 column 1, Well H1 on flow actually comes from H6
metadata$REALcol[which(metadata$Plate==1 & metadata$Date == "24-07-02" & metadata$col==1 & metadata$row=="H")] <- 6
# finally, plate 1 column1: Well A1 on flow actually comes from A6. But note the mistake on Day1
metadata$REALcol[which(metadata$Plate==1 & metadata$Date == "24-07-02" & metadata$Day!=1 & metadata$col==1 & metadata$row=="A")] <- 6
# on 2 July Day 1, plate 1 well A1 on flow actually comes from A8
metadata$REALcol[which(metadata$Plate==1 & metadata$Date == "24-07-02" & metadata$Day==1 & metadata$col==1 & metadata$row=="A")] <- 8
# now we are finished with the NON-INNOC annotations
# we can put together the row and REALcol columns to get the location on the OD plate
data_meta <- metadata %>% filter(Plate != "Innoc") %>% unite("OD_well", c(row, REALcol), sep="") %>% select(-col) %>%
unite("uniqID", c(Date, Incubator, OD_well), sep=" ", remove = FALSE)
# last (and perhaps least), annotate the additional blank wells from 2 July,
july2_blanks <- data_meta %>% filter(Date=="24-07-02", CommRich==0) %>%
select(-FLOWplateWell, -Plate, -uniqID, -OD_well) %>% distinct()
missing_blanks <- data.frame(OD_well=c("A1", "H1", "H12", paste0(LETTERS[1:7],11), paste0(LETTERS[2:7],6)),
FLOWplateWell=NA, Plate=NA) %>%
mutate(uniqID=paste("24-07-02 Epoch", OD_well), .keep="all")
july2_missing <- merge(july2_blanks,missing_blanks)
data_meta <- rbind(data_meta, july2_missing)
rm(july2_blanks, missing_blanks, july2_missing)
#####
# Innoc data: add OD_well and uniqID columns
#####
# In order to annotate the most raw version of the data, I decided to create redundant rows for the Innoc data. This way each row from Innoc appears 5x with its associated OD_well and uniqID.
innoc_meta <- metadata %>% filter(Plate == "Innoc") %>% select(-row, -col, -REALcol, -Incubator, -Heat)
innoc_meta <- suppressWarnings( # we expect left_join to be upset about many-to-many relationship, no need to issue warning.
left_join(innoc_meta,
data_meta %>%
select(-FLOWplateWell, -Day, -Plate, -Heat_Day, -Recov_Day) %>%
distinct(), # remove the redundant rows from each day
by = c("CommRich", "putida", "protegens", "grimontii", "veronii", "Date"))
)
# trash the now old df to avoid confusion
rm(metadata)
# save the complete metadata to file
write.csv(rbind(data_meta, innoc_meta), file="./intermediate_data/annotation_for_alldata.csv", quote=FALSE, row.names=FALSE)
#####
# Save the fully annotated raw flow cytometry counts data
#####
# associate the metadata back with the raw counts data
# for Days > 0:
temp_metadata <- data_meta %>% separate_wider_regex(FLOWplateWell, c(FLOW="plate\\w+-", Well="\\w+"))
annot.days <- inner_join(temp_metadata, annotated.rawdata,
by=c("Well", "putida", "protegens", "grimontii", "veronii", "CommRich", "Date",
"Day", "Incubator", "Plate", "Heat", "Heat_Day", "Recov_Day")) %>%
unite("FLOWplateWell", c(FLOW, Well), sep="")
rm(temp_metadata, data_meta)
# for Innoc Days:
temp_metainnoc <- innoc_meta %>% separate_wider_regex(FLOWplateWell, c(FLOW="plate\\w+-", Well="\\w+"))
annot.innoc <- left_join(temp_metainnoc,
annotated.rawdata %>% select(-Incubator, -Heat),
by=c("Well", "putida", "protegens", "grimontii", "veronii", "CommRich", "Date",
"Day", "Plate", "Heat_Day", "Recov_Day")) %>%
unite("FLOWplateWell", c(FLOW, Well), sep="")
# save this to file as well
write.csv(rbind(annot.days, annot.innoc), file="./intermediate_data/flow_rawdata.csv", quote=FALSE, row.names=FALSE)
# remove annotated.rawdata as it has been superseded by annot.days and annot.innoc
rm(annotated.rawdata, temp_metainnoc, innoc_meta)
########
# fix annotation mistake for flow cytometry acquisition of uniqID "24-07-02 Epoch A6"
# this well is missing on Day 1 bc A8 was pipetted there instead. But now we have 2 wells for "24-07-02 Epoch A8"...
########
wrong_row <- which(annot.days$uniqID == "24-07-02 Epoch A8" & annot.days$Day == 1 & is.na(annot.days$Volume))
annot.days$uniqID[wrong_row] <- "24-07-02 Epoch A6"
annot.days$OD_well[wrong_row] <- "A6"
annot.days$putida[wrong_row] <- 0
annot.days$protegens[wrong_row] <- 1
annot.days$grimontii[wrong_row] <- 1
annot.days$veronii[wrong_row] <- 0
annot.days$CommRich[wrong_row] <- 2
rm(wrong_row)
The data from Day 0 (annot.innoc) is 3x measurements of
the innoculum that is used to inoculate the 5 replicates. For the
summary annotation file (aka metadata above), each
FLOWplateWell appears redundantly with the up to 5
associated uniqID and OD_well replicates. I chose to do this so that the
raw values from the flow cytometry data are preserved with their
relevant annotation.
Below, this Day 0 data is averaged across the 3 different
FLOWplateWell values. This mean that below Day 0 is now
joined with the rest of the data in the variable
annotated.rawdata.
# average the Day0 data actually across its redundant flow cytometery measurements...
mean.innoc <- annot.innoc %>% group_by(uniqID, OD_well, Incubator, Plate, Heat, Date,
Day, Heat_Day, Recov_Day,
CommRich, putida, protegens, grimontii, veronii) %>%
summarise(Mean_putida = mean(Count_putida),
Mean_protegens = mean(Count_protegens),
Mean_grimontii = mean(Count_grimontii),
Mean_veronii = mean(Count_veronii),
SD_putida = sd(Count_putida),
SD_protegens = sd(Count_protegens),
SD_grimontii = sd(Count_grimontii),
SD_veronii = sd(Count_veronii),
Vol_mean = mean(Volume),
vol_sd = sd(Volume))
## `summarise()` has grouped output by 'uniqID', 'OD_well', 'Incubator', 'Plate',
## 'Heat', 'Date', 'Day', 'Heat_Day', 'Recov_Day', 'CommRich', 'putida',
## 'protegens', 'grimontii'. You can override using the `.groups` argument.
# here's some plots to summarize how much variation there is between measurements of the same inocula
plotting_mean.innoc <- mean.innoc %>% pivot_longer(cols = Mean_putida:SD_veronii,
names_to = c(".value", "species"),
names_sep = "_") %>%
filter(Incubator != "H1") # the same innoculum was used for 2 treatments on 24-07-08. Remove this redundancy for plotting
ggplot(plotting_mean.innoc,
aes(x=Mean, y=SD, colour=species)) +
geom_point(alpha=0.7) +
scale_colour_manual(values=species_4pal_alphabetical) +
labs(title="3 measures of innoculum")
ggplot(plotting_mean.innoc,
aes(x=Mean, y=SD, colour=Date)) +
geom_point(alpha=0.7) +
labs(title="3 measures of innoculum")
ggplot(plotting_mean.innoc %>% mutate(CV = SD/Mean),
aes(x=Mean, y=CV, colour=species)) +
geom_point(alpha=0.7) +
scale_colour_manual(values=species_4pal_alphabetical) +
labs(title="3 measures of innoculum")
## Warning: Removed 159 rows containing missing values or values outside the scale range
## (`geom_point()`).
# due to false positive counts,
# the CV blows up when I am counting species that are not actually in that sample
ggplot(plotting_mean.innoc %>% filter((putida == 1 & species == "putida") |
(protegens == 1 & species == "protegens") |
(grimontii == 1 & species == "grimontii") |
(veronii == 1 & species == "veronii")) %>%
mutate(CV = SD/Mean),
aes(x=Mean, y=CV, colour=species)) +
geom_point(alpha=0.7) +
scale_colour_manual(values=species_4pal_alphabetical) +
labs(title="3 measures of innoculum (remove absent sp)")
ggplot(plotting_mean.innoc %>% unite("community", putida:veronii),
aes(x=community, y=Mean, colour=species)) +
geom_point(alpha=0.7) +
geom_errorbar(aes(ymin=Mean-SD, ymax=Mean+SD), width=.2) +
scale_colour_manual(values=species_4pal_alphabetical) +
theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust=1)) +
labs(title="3 measures of innoculum")
ggplot(plotting_mean.innoc %>% filter((putida == 1 & species == "putida") |
(protegens == 1 & species == "protegens") |
(grimontii == 1 & species == "grimontii") |
(veronii == 1 & species == "veronii")) %>%
unite("community", putida:veronii),
aes(x=community, y=Mean, colour=species)) +
geom_point(alpha=0.7) +
geom_errorbar(aes(ymin=Mean-SD, ymax=Mean+SD), width=.2) +
scale_colour_manual(values=species_4pal_alphabetical) +
theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust=1)) +
labs(title="3 measures of innoculum (remove absent sp)")
ggplot(plotting_mean.innoc %>% ungroup() %>%
select(-uniqID, -OD_well) %>% distinct() %>%
filter((putida == 1 & species == "putida") |
(protegens == 1 & species == "protegens") |
(grimontii == 1 & species == "grimontii") |
(veronii == 1 & species == "veronii")) %>%
unite("community", putida:veronii) %>% mutate(CV = SD/Mean),
aes(x=community, y=CV, colour=species)) +
geom_jitter(width=0.2, alpha=0.7) +
scale_colour_manual(values=species_4pal_alphabetical) +
theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust=1)) +
labs(title="3 measures of innoculum (remove absent sp)")
# I have no idea what I would do with this info about sample volume, but here it is:
# There's no values for 24-07-02 because I accidentally forgot to save the .apx file (this is before I realized that volume is not saved in the .acs files)
ggplot(plotting_mean.innoc,
aes(x=Vol_mean, y=vol_sd, colour=Date)) +
geom_point(alpha=0.7) +
labs(title="3 measures of innoculum")
## Warning: Removed 300 rows containing missing values or values outside the scale range
## (`geom_point()`).
# finally, we can add the mean counts for the Innoc to the whole data
annotated.rawdata <- mean.innoc %>% select(-SD_putida, -SD_protegens, -SD_grimontii, -SD_veronii, -vol_sd) %>%
rename(Count_putida=Mean_putida, Count_protegens=Mean_protegens, Count_grimontii=Mean_grimontii, Count_veronii=Mean_veronii, Volume=Vol_mean)
annotated.rawdata <- rbind(annotated.rawdata, annot.days)
# cleanup
rm(annot.days, annot.innoc, plotting_mean.innoc)
I define the misclassification rate as \(\frac{\text{false positive events}}{\text{total events across all fluorescences}}\). In other words, I am counting the number of events in the gate(s) where I know there should be zero then dividing by the total number of fluorescent events in that well. To estimate the misclassification rate, I use the data from Day 0.
# use Day0 innoculum measurements for a first pass at estimating the misclassification rate
# i.e., the rate of falsely classifying as species A when I know for certain that species A is not present in my sample
misclass.innoc <- mean.innoc %>% mutate(Total_counts = Mean_putida + Mean_protegens + Mean_grimontii + Mean_veronii) %>% # get total for each sample
ungroup() %>% select(-uniqID, -OD_well) %>% distinct() %>% # remove any redundant data
# put each species count in its own row in the column called mean (instead of having a column for each species)
pivot_longer(cols = Mean_putida:SD_veronii,
names_to = c(".value", "species"),
names_sep = "_") %>%
filter(Incubator != "H1") %>% # remove the redundant data
# keep just the instances where we know for sure that this species was NOT present
filter((putida == 0 & species == "putida") |
(protegens == 0 & species == "protegens") |
(grimontii == 0 & species == "grimontii") |
(veronii == 0 & species == "veronii")) %>%
# misclassification rate is the number of events / total counts
mutate(mean_rate = Mean/Total_counts,
sd_rate = SD/Total_counts)
# re-order the species from fast to slow for better plotting
misclass.innoc$species <- factor(misclass.innoc$species,
levels = c("putida", "protegens", "grimontii", "veronii"))
ggplot(misclass.innoc, aes(x=species, y=mean_rate, colour=species)) +
geom_beeswarm(alpha=0.5) +
geom_errorbar(aes(ymin=mean_rate-sd_rate, ymax=mean_rate+sd_rate), width=.05, alpha=0.2) +
scale_colour_manual(values=species_4pal_speed) +
labs(title="misclassification rate in innoculum", y="mean +/- SD")
max(misclass.innoc$mean_rate)
## [1] 0.008537981
ggplot(misclass.innoc %>% unite("community", putida:veronii),
aes(x=species, y=mean_rate, colour=species)) +
facet_wrap(vars(community)) +
geom_point(alpha=0.5) +
scale_y_continuous(breaks = c(0, 0.005, 0.01)) +
scale_colour_manual(values=species_4pal_speed) +
theme(axis.text.x = element_text(angle = 90)) +
labs(title="misclassification rate in innoculum")
# summarize the mean and max misclassification rates observed for each species
misclass.innoc %>% group_by(species) %>% summarise(mean_misclass = mean(mean_rate),
max_misclass = max(mean_rate))
# clean-up
rm(misclass.innoc, mean.innoc)
From here we can clearly see that the misclassification rate can be as bad as 1% and that it depends on the species. Protegens is the most likely to be misclassified and, from the plot of all possible community combinations, we see that the problem seems to be that putida cells are being misclassified as belonging to protegens.
But I know that this rate of misclassification also depends on environmental conditions. So I don’t think it makes sense to correct the data using the exact values given above. The more cautious approach would be to treat with caution any counts that are less than 1%.
Here we make decisions about which data to keep and which to toss.
I need to set a threshold for the minimum number of fluorescent events observed in a well in order for me to decide that the well is not trustworthy.
At some point I did sample some wells that are true negatives. From this we learn that a true negative can have as many as 20 total events.
Remember that I set the stopping conditions for 10 000 events in the cell gate OR until it reaches the end of the sample (which seems to be 146uL). Let’s rather arbitrarily set the minimum total events in the well at 51 and see what happens with that.
annotated.rawdata <- annotated.rawdata %>% mutate(Total_counts = Count_putida + Count_protegens + Count_grimontii + Count_veronii) %>%
mutate(Total_density = Total_counts/Volume)
# plot the counts and volume for true negative wells
ggplot(annotated.rawdata %>% filter(CommRich==0, !is.na(Total_counts)),
aes(x=Total_counts, y=Volume)) +
geom_point() +
labs(title="True negatives")
# plot the total counts as a histogram just to see what the dispersal is like
ggplot(annotated.rawdata, aes(x=Total_counts)) +
geom_histogram(colour="black", fill="white") +
labs(title="everything")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 387 rows containing non-finite outside the scale range
## (`stat_bin()`).
ggplot(annotated.rawdata, aes(x=Total_counts)) +
geom_histogram(colour="black", fill="white") +
scale_x_log10() +
labs(x="Total_counts in LOG SCALE!", title="everything")
## Warning in scale_x_log10(): log-10 transformation introduced infinite values.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 396 rows containing non-finite outside the scale range
## (`stat_bin()`).
ggplot(annotated.rawdata, aes(x=Total_counts)) +
geom_histogram(colour="black", fill="white") +
scale_x_continuous(limits = c(-10,1010)) +
labs(title="everything")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 2014 rows containing non-finite outside the scale range
## (`stat_bin()`).
## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_bar()`).
# then plot the total counts against the volume because we expect these very low counts should be associated with the highest volumes
ggplot(annotated.rawdata, aes(x=Total_counts, y=Volume, colour=as.factor(Heat_Day))) +
geom_point(alpha=0.5) +
scale_colour_viridis_d(option = "plasma", begin=0.2, end = 0.9) +
labs(colour="Day of heat") +
labs(title="everything")
## Warning: Removed 1652 rows containing missing values or values outside the scale range
## (`geom_point()`).
# okay, let's just see a histogram of the total cell density
ggplot(annotated.rawdata, aes(x=Total_density)) +
geom_histogram(colour="black", fill="white") +
scale_x_log10() +
labs(x="Total_density in LOG SCALE!") +
labs(title="everything")
## Warning in scale_x_log10(): log-10 transformation introduced infinite values.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 471 rows containing non-finite outside the scale range
## (`stat_bin()`).
ggplot(annotated.rawdata %>% filter(!is.na(Heat_Day)),
aes(x=Total_density)) +
facet_grid(rows = vars(Heat_Day)) +
geom_histogram(colour="black", fill="white") +
scale_x_log10() +
labs(x="Total_density in LOG SCALE!", title="Day of Heat (everything)")
## Warning in scale_x_log10(): log-10 transformation introduced infinite values.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 161 rows containing non-finite outside the scale range
## (`stat_bin()`).
ggplot(annotated.rawdata %>% filter(!is.na(Recov_Day)),
aes(x=Total_density)) +
facet_grid(rows = vars(Recov_Day)) +
geom_histogram(colour="black", fill="white") +
scale_x_log10() +
labs(x="Total_density in LOG SCALE!", title="Day of Recovery (everything)")
## Warning in scale_x_log10(): log-10 transformation introduced infinite values.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 133 rows containing non-finite outside the scale range
## (`stat_bin()`).
### check what these graphs look like when I exclude wells where Total_counts < 51
ggplot(annotated.rawdata %>% filter(Total_counts > 50),
aes(x=Total_counts, y=Volume, colour=as.factor(Heat_Day))) +
geom_point(alpha=0.5) +
scale_colour_viridis_d(option = "plasma", begin=0.2, end = 0.9) +
labs(colour="Day of heat") +
labs(title="Total_counts > 50")
## Warning: Removed 1218 rows containing missing values or values outside the scale range
## (`geom_point()`).
ggplot(annotated.rawdata %>% filter(Total_counts > 50),
aes(x=Total_density)) +
geom_histogram(colour="black", fill="white") +
scale_x_log10() +
labs(x="Total_density in LOG SCALE!") +
labs(title="Total_counts > 50")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 75 rows containing non-finite outside the scale range
## (`stat_bin()`).
ggplot(annotated.rawdata %>% filter(Total_counts > 50, !is.na(Heat_Day)),
aes(x=Total_density)) +
facet_grid(rows = vars(Heat_Day)) +
geom_histogram(colour="black", fill="white") +
scale_x_log10() +
labs(x="Total_density in LOG SCALE!", title="Day of Heat (Total_counts > 50)")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
ggplot(annotated.rawdata %>% filter(Total_counts > 50, !is.na(Recov_Day)),
aes(x=Total_density)) +
facet_grid(rows = vars(Recov_Day)) +
geom_histogram(colour="black", fill="white") +
scale_x_log10() +
labs(x="Total_density in LOG SCALE!", title="Day of Recovery (Total_counts > 50)")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
#######
# set threshold of > 50 events in total
#######
# copy everything EXCEPT BLANK WELLS to new variable
the.data <- annotated.rawdata %>% filter(CommRich != 0)
# summarize some information about the data points that I'm about to exclude
the.data %>% filter(Total_counts < 51) %>% ungroup() %>% select(uniqID, Heat, Day, Heat_Day, Recov_Day, CommRich, Volume, Total_counts) %>% summary()
## uniqID Heat Day Heat_Day
## Length:87 Min. : 0.00 Min. :2.000 Min. :2.00
## Class :character 1st Qu.:48.00 1st Qu.:3.000 1st Qu.:2.75
## Mode :character Median :48.00 Median :4.000 Median :3.00
## Mean :41.66 Mean :3.621 Mean :2.75
## 3rd Qu.:48.00 3rd Qu.:4.000 3rd Qu.:3.00
## Max. :48.00 Max. :5.000 Max. :3.00
## NA's :47
## Recov_Day CommRich Volume Total_counts
## Min. :1.000 Min. :1.000 Min. :145.0 Min. : 0.00
## 1st Qu.:1.000 1st Qu.:1.000 1st Qu.:145.0 1st Qu.: 2.50
## Median :1.000 Median :2.000 Median :145.0 Median : 7.00
## Mean :1.462 Mean :1.782 Mean :145.4 Mean :10.72
## 3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:146.0 3rd Qu.:15.00
## Max. :2.000 Max. :4.000 Max. :146.0 Max. :50.00
## NA's :48
# exclude from analysis all non-blanks rows where Total_counts < 51
the.data$CommRich[which(the.data$Total_counts < 51)] <- NA
# put the blank data back with the whole dataset
the.data <- rbind(the.data,
annotated.rawdata %>% filter(CommRich == 0))
# replace the data with NA values for all rows where Total_counts < 51
# this includes both the excluded unreliable data as well as the true blanks flow data
the.data$Count_putida[which(the.data$Total_counts < 51)] <- NA
the.data$Count_protegens[which(the.data$Total_counts < 51)] <- NA
the.data$Count_grimontii[which(the.data$Total_counts < 51)] <- NA
the.data$Count_veronii[which(the.data$Total_counts < 51)] <- NA
the.data$Total_density[which(the.data$Total_counts < 51)] <- NA
the.data$Total_counts[which(the.data$Total_counts < 51)] <- NA
# clean-up
rm(annotated.rawdata)
I have re-assigned all wells that had less than 51 total fluorescent events as NA values. This was a total of 87 wells.
Note that I’ve also removed any flow cytometry data from the true negative wells. This was 17 wells.
# calculate densities and relative abundances for each species
the.data <- the.data %>% mutate(Conc_putida = Count_putida/Volume,
Conc_protegens = Count_protegens/Volume,
Conc_grimontii = Count_grimontii/Volume,
Conc_veronii = Count_veronii/Volume,
relDen_putida = Count_putida/Total_counts,
relDen_protegens = Count_protegens/Total_counts,
relDen_grimontii = Count_grimontii/Total_counts,
relDen_veronii = Count_veronii/Total_counts) #%>%
#select(-Total_counts)
# sanity check that the relative densities are always adding up to 1
check <- the.data %>% mutate(sum_relDen = relDen_putida + relDen_protegens + relDen_grimontii + relDen_veronii) %>%
# for convenience, remove the 87 NA values
drop_na(Total_counts)
all.equal(check$sum_relDen, rep(1, nrow(check))) %>% # use all.equal() as there seem values very close to 1 but not exactly equal to 1
stopifnot()
rm(check)
I have calculated the relative densities and made sure that all relative densities add up to 1.
Before diving deeper into the data, let’s just see quickly what the time series look like:
# check: is each replicated time series annotated appropriately so that it can be pieced together?
the.data <- the.data %>% unite("community", putida:veronii, remove=FALSE) %>% ungroup()
for(com in unique(the.data$community)) {
plot( ggplot(the.data %>% filter(community==com) %>%
select(uniqID, Heat, Day, relDen_putida, relDen_protegens, relDen_grimontii, relDen_veronii) %>%
pivot_longer(cols=starts_with("relDen"), names_to="species", names_prefix="relDen_", values_to="relDensity"),
aes(x=Day, y=relDensity, colour=species, group=paste(uniqID,Heat,species))) +
facet_grid(~Heat) +
geom_point(alpha=0.2) +
geom_line(alpha=0.5) +
scale_colour_manual(values=species_4pal_alphabetical) +
labs(title=com))
}
## Warning: Removed 16 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Warning: Removed 28 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_line()`).
## Warning: Removed 8 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Warning: Removed 92 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Warning: Removed 48 rows containing missing values or values outside the scale range
## (`geom_line()`).
## Warning: Removed 16 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Warning: Removed 16 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Warning: Removed 40 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Warning: Removed 36 rows containing missing values or values outside the scale range
## (`geom_line()`).
## Warning: Removed 28 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_line()`).
## Warning: Removed 16 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_line()`).
## Warning: Removed 40 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Warning: Removed 32 rows containing missing values or values outside the scale range
## (`geom_line()`).
## Warning: Removed 68 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Warning: Removed 64 rows containing missing values or values outside the scale range
## (`geom_line()`).
## Warning: Removed 16 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Warning: Removed 32 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_line()`).
## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Warning: Removed 1516 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Warning: Removed 1516 rows containing missing values or values outside the scale range
## (`geom_line()`).
# clean up
rm(com)
After staring at the above time series for long enough, two things become clear
Protegens has contaminated several wells. This is unambiguous contamination when it is present in communities where it was not innoculated. For these contaminated replicates, the entire time-series will be excluded from the analysis.
The misclassification rate varies over time: e.g., putida is misclassified in a protegens monoculture (0_1_0_0) on day 1 for 3 different heat treatments. It appears in that well with a density of > 10% ! As well, protegens and veronii are misclassified in putida monoculture (1_0_0_0) on days 2 and 3 of 24 hrs heat.
Of the 87 NA values identified above,
Some occurred as a result of flow cytometry issues. E.G., there
was probably a bubble that I didn’t notice. (When I noticed the bubble,
I would re-run that well. But this is only after I began to understand
that this was happening. So some wells were unfortunately lost because
of this error.)
IN THIS CASE: this is a true NA value. It only happens at one time point
(which may or may not be a heat day). And there is data for this well
during the recovery period.
Some occurred as a result of prolonged heat exposure that dropped
the total density in that well below the threshold of detection. This
only happened on day 3 of heat for the longest heat treatment. There is
data for this well during the recovery period.
IN THIS CASE: this is a true NA value.
Others occurred as a result of prolonged heat exposure that drove
the well to complete extinction. There is no flow cytometery data for
this well during the recovery period because it went extinct. Extinction
needs to be confirmed against the OD data.
IN THIS CASE: this is a true NA value during the heat treatment
but it should become a 0 value during the recovery
period.
The OD data is analyzed in the file called
main_expt--OD_analysis.Rmd. This script outputs a csv file
indicating the extinct wells, which I will use below.
# import extinct well data from file
extinct <- read.csv("./intermediate_data/extinctOD_wells.csv")
# we know that there was no detectable growth on Recovery days. So replace the current values with true 0's here.
the.data[which(the.data$uniqID %in% extinct$uniqID & the.data$Recov_Day>0),] <- the.data[which(the.data$uniqID %in% extinct$uniqID & the.data$Recov_Day>0),] %>%
mutate(Total_density=0, Conc_putida=0, Conc_protegens=0, Conc_grimontii=0, Conc_veronii=0,
relDen_putida=0, relDen_protegens=0, relDen_grimontii=0, relDen_veronii=0,
CommRich=putida+protegens+grimontii+veronii)
# during the heat days, we know that there was no OD-detectable growth for (extinct$Day + 1).
# This means any flow cytometry data we have is unreliable and should be replaced with NA.
# wells where Day 2 is unreliable
tmp <- extinct %>% filter(Day == 1)
the.data[which(the.data$uniqID %in% tmp$uniqID & the.data$Day==2),] <- the.data[which(the.data$uniqID %in% tmp$uniqID & the.data$Day==2),] %>%
mutate(Total_density=NA, Conc_putida=NA, Conc_protegens=NA, Conc_grimontii=NA, Conc_veronii=NA,
relDen_putida=NA, relDen_protegens=NA, relDen_grimontii=NA, relDen_veronii=NA,
CommRich=NA)
rm(tmp)
# wells where Day 3 is unreliable (and Day 3 is a heat day!)
extinct <- extinct[-which(extinct$uniqID %in% c("24-08-19 Epoch B4", "24-08-19 Epoch D2")),]
the.data[which(the.data$uniqID %in% extinct$uniqID & the.data$Day==3),] <- the.data[which(the.data$uniqID %in% extinct$uniqID & the.data$Day==3),] %>%
mutate(Total_density=NA, Conc_putida=NA, Conc_protegens=NA, Conc_grimontii=NA, Conc_veronii=NA,
relDen_putida=NA, relDen_protegens=NA, relDen_grimontii=NA, relDen_veronii=NA,
CommRich=NA)
rm(extinct)
To address both the problem of contamination & the problem of the misclassification rate varying over time, I had to closely re-examine the flow cytometry raw data (which I did by eye, insert crying emoji).
From the Day 0 data, I hypothesized that the misclassification rate is ~1%. So let’s pull up the identity of all the flow cytometry data where >1% of the relative density is attributed to a species that was not inoculated in that well (i.e., and therefore it should not be there). I then manually examined the flow cytometry raw data files in FCS Express for all the wells listed below:
# identify contamination at 1%
contamin.df <- the.data %>% filter((putida == 0 & relDen_putida > 0.01) |
(protegens == 0 & relDen_protegens > 0.01) |
(grimontii == 0 & relDen_grimontii > 0.01) |
(veronii == 0 & relDen_veronii > 0.01))
contamin.df %>% filter(Date %in% c("24-08-05", "24-08-19")) %>% select(Date, FLOWplateWell, Day, community,
relDen_putida, relDen_protegens, relDen_grimontii, relDen_veronii)
The gating for all the wells listed above (and more) was double-checked by eye in FCS Express and new cell counts were outputted. It seemed to me that there is a correlation between the heat environment, or at least the day of the serial transfer, and how clean or messy the gating looked. In particular it seemed to me that it was more difficult to classify species during heat days.
We know from the OD data (see main_expt--OD_analysis)
that the 24h of heat treatment had no instance of contamination detected
(i.e., at least for the blank wells). Since I suspect that the
misclassification rate changes with the heat day, let’s assume that the
24h heat treatment does not contain any contamination events then look
at the occurence of species that were never inoculated in those wells as
an estimate of the misclassification rate on different days of the
serial transfer.
(In the future: it should also be possible to get a covariance matrix to estimate which species are being misclassified as which.)
misclass24 <- the.data %>% filter(Day > 0, Heat == 24) %>%
filter((putida == 0 & relDen_putida > 0) |
(protegens == 0 & relDen_protegens > 0) |
(grimontii == 0 & relDen_grimontii > 0) |
(veronii == 0 & relDen_veronii > 0))
# separate the correctly called species from the species that are absent
misclass24_REAL <- misclass24 %>% mutate(relDen_putida = putida * relDen_putida,
relDen_protegens = protegens * relDen_protegens,
relDen_grimontii = grimontii * relDen_grimontii,
relDen_veronii = veronii * relDen_veronii)
misclass24 <- misclass24 %>% mutate(relDen_putida = abs(putida-1) * relDen_putida,
relDen_protegens = abs(protegens-1) * relDen_protegens,
relDen_grimontii = abs(grimontii-1) * relDen_grimontii,
relDen_veronii = abs(veronii-1) * relDen_veronii)
# pivot longer so there's a column for species
misclass24_REAL <- misclass24_REAL %>% pivot_longer(cols = relDen_putida:relDen_veronii,
values_to = "relDen",
names_to = "species",
names_prefix = "relDen_") %>%
select(uniqID, Day, community, putida, protegens, grimontii, veronii,
Total_density, relDen, species)
misclass24 <- misclass24 %>% pivot_longer(cols = relDen_putida:relDen_veronii,
values_to = "relDen",
names_to = "species",
names_prefix = "relDen_") %>%
select(uniqID, Day, community, putida, protegens, grimontii, veronii,
Total_density, relDen, species)
# remove the true species from the misclass data because these are now fake 0's
misclass24 <- misclass24[-which(misclass24$putida == 1 & misclass24$species == "putida"),]
misclass24 <- misclass24[-which(misclass24$protegens == 1 & misclass24$species == "protegens"),]
misclass24 <- misclass24[-which(misclass24$grimontii == 1 & misclass24$species == "grimontii"),]
misclass24 <- misclass24[-which(misclass24$veronii == 1 & misclass24$species == "veronii"),]
# remove the single contaminated sample
misclass24 <- misclass24[-which(misclass24$protegens == 0 & misclass24$species == "protegens" & misclass24$relDen > 0.75),]
ggplot(misclass24,
aes(x=species, y=relDen, colour=species)) +
facet_wrap(vars(Day)) +
geom_beeswarm(alpha=0.5) +
scale_colour_manual(values=species_4pal_alphabetical) +
theme(axis.text.x = element_text(angle = 90)) +
labs(y="relative density of misclassified",
title="misclassification in 24h heat for different days")
# clean up
rm(misclass24, misclass24_REAL)
Recall that for 24h duration, Day 1 of serial transfer had 6h of extreme heat at the end, Day 2 was all extreme heat then returned to the “ambient” warm temperature only in the last few hours, and Days 3 & 4 were the recovery days with constant “ambient” warm temperature.
What we see from the plot above is that the misclassification rate can get as high as 20% (and that it does depend on the day but it seems that the first day of recovery is actually worse than the heat days themselves), although most replicates & days seem to be well behaved.
Therefore let’s set 25% as the threshold for contamination. This means that any replicates that show >25% relative density for a species that was not inoculated there are defined as contaminated. All time-points from these contaminated replicates are completely removed from the downstream analysis.
rm(contamin.df) # remove anything we may have had above.
# for now let's define contamination as >25% for something that should not be there.
contamin.df <- the.data %>% filter((putida == 0 & relDen_putida > 0.25) |
(protegens == 0 & relDen_protegens > 0.25) |
(grimontii == 0 & relDen_grimontii > 0.25) |
(veronii == 0 & relDen_veronii > 0.25))
tmp <- the.data[-which(the.data$uniqID %in% unique(contamin.df$uniqID)),]
###############
# output absolute density data for analysis
###############
# Day 0 would need to be the pre-dilution absolute densities
ggplot(tmp %>% filter(Day==0) %>% select(-uniqID, -OD_well, -Heat, -Incubator) %>% distinct(),
aes(x=Date, y=Total_density)) +
geom_beeswarm() +
labs(y="")
## Warning: Removed 15 rows containing missing values or values outside the scale range
## (`geom_point()`).
ggplot(tmp %>% filter(Day==0) %>% select(-uniqID, -OD_well, -Heat, -Incubator) %>% distinct(),
aes(x=Date, y=Volume)) +
geom_beeswarm() +
labs(y="")
## Warning: Removed 15 rows containing missing values or values outside the scale range
## (`geom_point()`).
# I lost the data on flow volume for Date 24-07-02.
# But we see from the plot that there's not *that* much variation between batches.
## IMPORTANT NOTE: the Day 0 data is not featured in any of the downstream analyses for this manuscript. Therefore this interpolation doesn't actually matter.
# let's interpolate the well volumes on Day 0 of 24-07-02 by using the median well volumes for all other dates on Day 0
tmp.Day0 <- tmp[which(tmp$Day==0 & tmp$Incubator=="Epoch"),] %>% select(-uniqID, -OD_well) %>% distinct()
# get the median volume for Day 0
medianVol <- median(tmp.Day0$Volume, na.rm=TRUE)
# apply the median volume to Day 0 values from 24-07-02
tmp.Day0 <- tmp.Day0 %>% filter(Date=="24-07-02")
tmp.Day0$Volume <- medianVol
# recalculate the absolute densities for Day0
tmp.Day0 <- tmp.Day0 %>% mutate(Total_density = Total_counts/Volume,
Conc_putida = Count_putida/Volume,
Conc_protegens = Count_protegens/Volume,
Conc_grimontii = Count_grimontii/Volume,
Conc_veronii = Count_veronii/Volume)
# finally, join the estimated absolute densities for Day0 back in with the whole data
tmp.Day0.0702 <- left_join(tmp %>% filter(Day==0, Date=="24-07-02") %>% select(-Volume, -Total_density, -Conc_putida, -Conc_protegens, -Conc_grimontii, -Conc_veronii),
tmp.Day0)
## Joining with `by = join_by(Incubator, Plate, Heat, Date, Day, Heat_Day,
## Recov_Day, CommRich, community, putida, protegens, grimontii, veronii,
## Count_putida, Count_protegens, Count_grimontii, Count_veronii, FLOWplateWell,
## Total_counts, relDen_putida, relDen_protegens, relDen_grimontii,
## relDen_veronii)`
tmp <- rbind(tmp %>% filter(Date != "24-07-02"),
tmp %>% filter(Date == "24-07-02") %>% filter(Day > 0),
tmp.Day0.0702)
rm(medianVol, tmp.Day0, tmp.Day0.0702)
# finally, remove known miscalled estimates from the data
tmp <- tmp %>% mutate(Conc_putida = putida * Conc_putida,
Conc_protegens = protegens * Conc_protegens,
Conc_grimontii = grimontii * Conc_grimontii,
Conc_veronii = veronii * Conc_veronii) %>%
mutate(Total_density = Conc_putida + Conc_protegens + Conc_grimontii + Conc_veronii)
# output this data to file
absDensity <- tmp %>% filter(community != "0_0_0_0") %>%
select(uniqID, Heat, Day, Heat_Day, Recov_Day, CommRich:veronii, Total_density:Conc_veronii)
save(absDensity, file="./intermediate_data/absolute_density_data.RData")
rm(tmp, com, contamin.df)
## Warning in rm(tmp, com, contamin.df): object 'com' not found
for(com in unique(absDensity$community)) {
plot(ggplot(absDensity %>% filter(community==com) %>%
select(uniqID, Heat, Day, Conc_putida, Conc_protegens, Conc_grimontii, Conc_veronii) %>%
pivot_longer(cols=starts_with("Conc"), names_to="species", names_prefix="Conc_", values_to="absDensity"),
aes(x=Day, y=absDensity, colour=species, group=paste(uniqID,Heat,species))) +
facet_grid(~Heat) +
geom_point(alpha=0.2) +
geom_line(alpha=0.5) +
scale_colour_manual(values=species_4pal_alphabetical) +
labs(title=com))
}
## Warning: Removed 16 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Warning: Removed 8 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Warning: Removed 16 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Warning: Removed 16 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Warning: Removed 16 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_line()`).
## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Warning: Removed 16 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Warning: Removed 16 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Warning: Removed 48 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Warning: Removed 16 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Warning: Removed 36 rows containing missing values or values outside the scale range
## (`geom_point()`).
# in the analyses below we will be interested in shannon diversity so let's already make a column for that
absDensity$Diversity <- diversity(absDensity[,c("Conc_putida", "Conc_protegens", "Conc_grimontii", "Conc_veronii")],
index = "shannon")
# first let's remove the empty wells as we won't need them anymore
absDensity <- absDensity %>% filter(community != "0_0_0_0")
# Note that there are many 0 and NA values for Total_density
summary(absDensity$Total_density)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.00 47.03 100.25 219.98 220.90 2674.00 72
# 0's are communities that went extinct altogether and never recovered
absDensity[which(absDensity$Total_density == 0),]
# most NA's are communities below the threshold of detection during heat that later perhaps recovered
absDensity[which(is.na(absDensity$Total_density) & absDensity$Heat>12),]
# other NA's are just missing data (e.g., due to flow cytometry clogs or just plain pipetting mistakes)
absDensity[which(is.na(absDensity$Total_density) & absDensity$Heat<12),]
# the total density data will have to be slightly adjusted for fitting to the models
absDen_forFit <- absDensity %>% filter(Day > 0)
# for the "raw" total density data that will be fitted via negative binomial GLM,
# keep the 0's in the data
# but convert NA's into epsilon values (where epsilon is just below the threshold of detection)
below_threshold_rows <- which(is.na(absDen_forFit$Total_density) & absDen_forFit$Heat>12)
absDen_forFit$Total_density[below_threshold_rows] <- (0.25*50/146)
rm(below_threshold_rows)
# re-arrange the levels of Heat so that emmeans can be run:
absDen_forFit$Heat <- as.character(absDen_forFit$Heat)
absDen_forFit$Heat[which(absDen_forFit$Heat == 0)] <- "control"
# !!! emmeans expects the control to be the very *last* level !!!
absDen_forFit$Heat <- factor(absDen_forFit$Heat,
levels = c("6", "12", "24", "48", "control"))
# clean up
rm(com)
After plotting, I also created another version of the full data that will be used for fitting the data to models: ``. It excludes the Day 0 data (as this will not be analyzed hereafter).
Below, I will analyze the diversity and the productivity (AKA total
density) to understand how they change relative to the no heat control
during the resistance and the recovery period. The diversity is easier
to deal with because we can use the Shannon diversity calculation as
implemented by vegan.
For the total abundances, there are extinction (AKA 0’s) and NA events in this data – the extinction events in particular are important and meaningful parts of our data!! To deal with this issue, I will distinguish between 0’s and NA’s by using a \(x + \epsilon\) transformation, where \(\epsilon\) indicates samples that are below the threshold of detection. I will use \(\epsilon\) as 0.25 * the threshold of detection for the flow cytometer (which is 50 total fluorescent events in \(146\mu L\)).
We define “average growth rate of the community” either as the
expectation from the inoculated communities (e.g., the quadruplet
community has expected growth rate = mean of the 4 species). In other
words, this assumes that all species that were inoculated in the
community remain at equal ratios and therefore the average growth rate
of a community is just the mean of the growth rates of the species that
were inoculated there. This is called the
community_expected_mu. (Recall that if we thought the
communities would stay fixed for the equal species ratios that we
inoculated them at, then we should use the geometric mean to calculate
the community growth rate. We use the arithmetic mean because we a
priori believe that the communities will tend to be dominated by
the faster growing species.)
We also define it as the mean of the realized communities by using
the species mean relative densities in the no heat control condition. In
other words, this takes into account the actual relative densities of
the species that can hang out together across serial transfers and uses
that as an expectation of the community’s growth rate. This is called
the community_averaged_mu.
Another trait that we assayed was whether species are sensitive or resistant to 40C heat (resistance is a binary trait: either TRUE or FALSE). We define communities as expected to be resistant when at least one of the inoculated species is resistant. We define communities as expected to be sensitive when none of the species are resistant.
# load in the stationary phase growth rate estimates from Expt1
load("./intermediate_data/expt1--all_growthcurve_data.RData")
rm(ALL_data.df, derivs.df, TTD.df) # keep just the dataframe with the growth rate estimates (mu)
# a look-up table for growth rates at 30C
growthrates.df <- Dil_growthrates.df %>% filter(Inoculum == "Stationary",
Temp == 30,
Sample %in% c("BSC001", "BSC005", "BSC019", "CK101")) %>%
arrange(desc(mu))
# a look-up table for resistance to 40C
resist.df <- Dil_growthrates.df %>% filter(Inoculum == "Stationary",
Temp == 40,
Sample %in% c("BSC001", "BSC005", "BSC019", "CK101")) %>%
mutate(resistant = ifelse(mu>0, 1, 0)) %>%
arrange(match(Sample, c("BSC001", "CK101", "BSC019", "BSC005")))
# calculate the average growth rate for the inoculated communities
absDen_forFit <- absDen_forFit %>% mutate(community_expected_mu = (growthrates.df$mu[1]*putida + growthrates.df$mu[2]*protegens + growthrates.df$mu[3]*grimontii + growthrates.df$mu[4]*veronii)/(putida + protegens + grimontii + veronii))
# calculate the average growth rate for the realized communities
temp <- absDensity %>% filter(Heat == 0) %>% group_by(community) %>%
mutate(relDen_putida = Conc_putida/Total_density,
relDen_protegens = Conc_protegens/Total_density,
relDen_grimontii = Conc_grimontii/Total_density,
relDen_veronii = Conc_veronii/Total_density) %>%
summarise(relDen_putida = median(relDen_putida, na.rm = TRUE),
relDen_protegens = median(relDen_protegens, na.rm = TRUE),
relDen_grimontii = median(relDen_grimontii, na.rm = TRUE),
relDen_veronii = median(relDen_veronii, na.rm = TRUE)) %>%
mutate(community_averaged_mu = growthrates.df$mu[1]*relDen_putida + growthrates.df$mu[2]*relDen_protegens + growthrates.df$mu[3]*relDen_grimontii + growthrates.df$mu[4]*relDen_veronii)
# get the community resistances
print(resist.df %>% select(Species, Sample, Temp, mu, resistant)) # recall that only putida is resistant
## Species Sample Temp mu resistant
## 1 P. putida BSC001 40 0.40721 1
## 2 P. protegens CK101 40 0.00000 0
## 3 P. grimontii BSC019 40 0.00000 0
## 4 P. veronii BSC005 40 0.00000 0
# so it's easy to get community resistance because it's just the presence/absence of putida
# add the information to the full data set
absDen_forFit <- inner_join(absDen_forFit, temp %>% select(community, community_averaged_mu)) %>%
mutate(resistant = putida)
## Joining with `by = join_by(community)`
# remember to also add the community growth rates and resistances to the other data frame (this one is used for the extinction analysis because Heat is numeric here)
absDensity <- inner_join(absDensity,
absDen_forFit %>%
select(community, community_expected_mu, community_averaged_mu, resistant) %>%
distinct())
## Joining with `by = join_by(community)`
# clean up
rm(temp, Dil_growthrates.df, resist.df, growthrates.df)
After backing up a bit and thinking about what the main story of the paper could be, I think the main message that I would like to tell with the paper is that heat duration has a threshold effect. So while shorter and intermediate heat durations have some effect during heat that is different from control, communities return to a similar state after recovery. On the other hand, long duration heat events lead to extinction (i.e., either of the entire community or of vulnerable species within the community) so the communities cannot recover anymore. In other words, there’s a threshold effect where the amount of heat (or bacterial) induced killing has gone on for so long than it passes a critical point and the communities recover to a different state. I don’t want to use the term “tipping point” but for sure the design of our experiment allows us to use phrases like “threshold effect” and “critical transition” sensu stricto (e.g., as explained in Munson et al., 2018).
I think it would be fantastic if I could produce a figure that summarizes the entire data in a way that builds an argument for the quintessential ball-landscape schematic that people keep showing when they talk about ecosystem stability to perturbation (e.g., see schematic in Fig. 2 of Shade et al., 2012 or the empirical figure in Fig. 3 of Jurburg et al., 2017).
Here are some tutorials on ordination: https://eddatascienceees.github.io/tutorial-rayrr13/ https://ourcodingclub.github.io/tutorials/ordination/ https://uw.pressbooks.pub/appliedmultivariatestatistics/chapter/anosim/ https://uw.pressbooks.pub/appliedmultivariatestatistics/chapter/visualizing-and-interpreting-ordinations/
To calculate the Bray-Curtis dissimilarity, we are forced to choose how to deal with NA values (most of which are found in the resistance time points and so it doesn’t really make sense to outright drop them). NA values exist for two reasons:
“true” missing data where the well was not acquired at all due to technical difficulties/mistakes (only a few of this type). I used interpolation to deal with these: use the median value from other replicates at that same community:day:treatment.
below threshold of detection missing data where the total density was too low to reliably estimate the cell counts. I replace the NA with the limit of detection of the cytometer (epsilon = 0.25*50/146, as above) and assume equal frequencies of the species that were inoculated in that community.
Then, if we just follow the example tutorials directly, with columns = four species and rows = different communities on different days for different heat treatments, then the data simply gets split up by species. But that’s not what we want to understand in this case.
We want to understand how the communities are changing over time (and as a function of different heat durations) so let’s give it the data as species x time. This can be achieved by widening the data so that we have abundances of the 4 species during resistance, during early recovery, and during late recovery.
Note that I also had to keep just 3 time points from the control treatment. I chose to keep day 1 (coded as “resistance”), day 3 (coded as “early recovery”), and day 5 (coded as “late recovery”) because this way the ordination plot will show the control treatment early, middle, and late in the time series…
# go back to the complete data that includes NA values for all 4 species on some days
absDen_forOrd <- absDen_forFit %>% select(-Diversity, -community_expected_mu, -community_averaged_mu, -resistant)
# NA values with Total_density == NA are "true" missing data where I failed to record the flow cytometry measurements on that day due to technical difficulties/mistakes. These can be interpolated by using the median values from the remaining community replicates
## get the median values for all communities, days, and heat treatments
median_vals <- absDen_forOrd %>% group_by(Heat, Day, community) %>%
summarise(Med_putida = median(Conc_putida, na.rm=TRUE),
Med_protegens = median(Conc_protegens, na.rm=TRUE),
Med_grimontii = median(Conc_grimontii, na.rm=TRUE),
Med_veronii = median(Conc_veronii, na.rm=TRUE))
## `summarise()` has grouped output by 'Heat', 'Day'. You can override using the
## `.groups` argument.
## get the index for the rows with "true" missing values
missing_rows <- which(is.na(absDen_forOrd$Total_density))
## loop through the missing values
for(i in missing_rows){
# find the interpolation value in the table of median values
temp_med_val <- median_vals[median_vals$Heat == absDen_forOrd$Heat[i] &
median_vals$Day == absDen_forOrd$Day[i] &
median_vals$community == absDen_forOrd$community[i],]
# replace the NA values with the median values
absDen_forOrd$Conc_putida[i] <- temp_med_val$Med_putida
absDen_forOrd$Conc_protegens[i] <- temp_med_val$Med_protegens
absDen_forOrd$Conc_grimontii[i] <- temp_med_val$Med_grimontii
absDen_forOrd$Conc_veronii[i] <- temp_med_val$Med_veronii
# clean up
rm(temp_med_val)
}
# clean up
rm(median_vals, missing_rows, i)
# on the other hand, NA values where Total_density is epsilon represent flow cytometry counts that were below the threshold of detection. In this case let's assume 1:1 ratios of inoculated strains at a total density equal to epsilon.
epsilon <- (0.25*50/146)
## get the index for the missing value rows below the threshold of detection
missing_rows <- which(is.na(absDen_forOrd$Conc_putida))
## CommRich NA values were supposed to indicate some differences but that doesn't really matter for us anymore
absDen_forOrd$CommRich <- absDen_forOrd$putida + absDen_forOrd$protegens + absDen_forOrd$grimontii + absDen_forOrd$veronii
for(i in missing_rows){
# replace the NA values with epsilon divided by the inoculated species richness
absDen_forOrd$Conc_putida[i] <- absDen_forOrd$putida[i] * epsilon / absDen_forOrd$CommRich[i]
absDen_forOrd$Conc_protegens[i] <- absDen_forOrd$protegens[i] * epsilon / absDen_forOrd$CommRich[i]
absDen_forOrd$Conc_grimontii[i] <- absDen_forOrd$grimontii[i] * epsilon / absDen_forOrd$CommRich[i]
absDen_forOrd$Conc_veronii[i] <- absDen_forOrd$veronii[i] * epsilon / absDen_forOrd$CommRich[i]
}
# re-order the levels of Heat for better plotting
absDen_forOrd$Heat <- factor(absDen_forOrd$Heat, levels=c("control", "6", "12", "24", "48"))
# finally, we can drop the total density column
absDen_forOrd <- absDen_forOrd %>% select(-Total_density)
rm(epsilon, missing_rows, i)
# first we have to widen the data:
# create a column that indicates the treatment day as resistance, early recovery, or late recovery
absDen_forOrd$trtmt_day <- "resist"
absDen_forOrd$trtmt_day[absDen_forOrd$Recov_Day == 1] <- "early_recov"
absDen_forOrd$trtmt_day[absDen_forOrd$Recov_Day == 2] <- "late_recov"
# ENTIRELY ARBITARARILY: I will keep days 1, 3, and 5 for control
absDen_forOrd$trtmt_day[absDen_forOrd$Heat == "control" & absDen_forOrd$Day == 3] <- "early_recov"
absDen_forOrd$trtmt_day[absDen_forOrd$Heat == "control" & absDen_forOrd$Day == 5] <- "late_recov"
# remove day 1 for 12h, 24h, 48h AND day 2 for 48h.
absDen_forOrd <- absDen_forOrd[!(absDen_forOrd$Heat == 12 & absDen_forOrd$Day == 1), ]
absDen_forOrd <- absDen_forOrd[!(absDen_forOrd$Heat == 24 & absDen_forOrd$Day == 1), ]
absDen_forOrd <- absDen_forOrd[!(absDen_forOrd$Heat == 48 & absDen_forOrd$Day == 1), ]
absDen_forOrd <- absDen_forOrd[!(absDen_forOrd$Heat == 48 & absDen_forOrd$Day == 2), ]
# also remove day 2 and day 4 for control.
absDen_forOrd <- absDen_forOrd[!(absDen_forOrd$Heat == "control" & absDen_forOrd$Day == 2), ]
absDen_forOrd <- absDen_forOrd[!(absDen_forOrd$Heat == "control" & absDen_forOrd$Day == 4), ]
# pivot wider to create a column for each of the 4 species on each of the 3 days
absDen_wide_forOrd <- absDen_forOrd %>% select(-Day, -Heat_Day, -Recov_Day) %>%
pivot_wider(names_from = trtmt_day,
values_from = c(Conc_putida, Conc_protegens, Conc_grimontii, Conc_veronii))
# re-name the species abundance over time columns so they are shorter (again for better plotting)
colnames(absDen_wide_forOrd)[9:20] <- c("Pu_Resist", "Pu_earlyR", "Pu_lateR",
"Pt_Resist", "Pt_earlyR", "Pt_lateR",
"Gi_Resist", "Gi_earlyR", "Gi_lateR",
"Vn_Resist", "Vn_earlyR", "Vn_lateR")
Now that we have the wide data, let’s calculate the distances and do the ordination with NMDS:
# The final result depends on the initial random placement of the points
# set seed to make the results reproducible
set.seed(64576)
# keep just the species abundances
abundance_matrix <- as.matrix(absDen_wide_forOrd[,9:20])
# a function to automatically run the NMDS for k = 1 to 10 so we can choose appropriately small number of dimensions for ordination
NMDS.scree <- function(mat) { #where x is the abundance matrix
data.frame(k = 1:10,
# autotransform the data before calculating the bray-curtis dissimilarity
stress = sapply(1:10, function(x) metaMDS(mat, distance = "bray", k = x, autotransform = TRUE)$stress))
}
scree_out <- NMDS.scree(abundance_matrix)
## Square root transformation
## Wisconsin double standardization
## Run 0 stress 0.2423303
## Run 1 stress 0.2929796
## Run 2 stress 0.2955769
## Run 3 stress 0.2514977
## Run 4 stress 0.2583327
## Run 5 stress 0.2520582
## Run 6 stress 0.2943351
## Run 7 stress 0.2621376
## Run 8 stress 0.3169983
## Run 9 stress 0.2648957
## Run 10 stress 0.3086427
## Run 11 stress 0.3007618
## Run 12 stress 0.2330685
## ... New best solution
## ... Procrustes: rmse 0.03139865 max resid 0.1011646
## Run 13 stress 0.3033715
## Run 14 stress 0.258922
## Run 15 stress 0.2490456
## Run 16 stress 0.2491303
## Run 17 stress 0.2540668
## Run 18 stress 0.2583461
## Run 19 stress 0.2690558
## Run 20 stress 0.3103607
## *** Best solution was not repeated -- monoMDS stopping criteria:
## 2: stress ratio > sratmax
## 18: scale factor of the gradient < sfgrmin
## Square root transformation
## Wisconsin double standardization
## Run 0 stress 0.09939261
## Run 1 stress 0.1740985
## Run 2 stress 0.1425599
## Run 3 stress 0.1526817
## Run 4 stress 0.1063753
## Run 5 stress 0.1412482
## Run 6 stress 0.1629482
## Run 7 stress 0.1121552
## Run 8 stress 0.1304401
## Run 9 stress 0.1423918
## Run 10 stress 0.126843
## Run 11 stress 0.1365722
## Run 12 stress 0.1431993
## Run 13 stress 0.1247585
## Run 14 stress 0.1354528
## Run 15 stress 0.09933652
## ... New best solution
## ... Procrustes: rmse 0.001291752 max resid 0.0227284
## Run 16 stress 0.09997366
## Run 17 stress 0.1401592
## Run 18 stress 0.1119758
## Run 19 stress 0.1426391
## Run 20 stress 0.1624407
## *** Best solution was not repeated -- monoMDS stopping criteria:
## 9: stress ratio > sratmax
## 11: scale factor of the gradient < sfgrmin
## Square root transformation
## Wisconsin double standardization
## Run 0 stress 0.05653678
## Run 1 stress 0.05653715
## ... Procrustes: rmse 0.0002103611 max resid 0.001156087
## ... Similar to previous best
## Run 2 stress 0.05804981
## Run 3 stress 0.05839031
## Run 4 stress 0.05668292
## ... Procrustes: rmse 0.01514321 max resid 0.04493888
## Run 5 stress 0.05881388
## Run 6 stress 0.05829128
## Run 7 stress 0.05761679
## Run 8 stress 0.05813329
## Run 9 stress 0.05822053
## Run 10 stress 0.05653658
## ... New best solution
## ... Procrustes: rmse 0.0002146318 max resid 0.001125414
## ... Similar to previous best
## Run 11 stress 0.05881759
## Run 12 stress 0.05668246
## ... Procrustes: rmse 0.01514773 max resid 0.04472723
## Run 13 stress 0.05849046
## Run 14 stress 0.05805011
## Run 15 stress 0.0565368
## ... Procrustes: rmse 0.0002165806 max resid 0.001150051
## ... Similar to previous best
## Run 16 stress 0.05832403
## Run 17 stress 0.05660328
## ... Procrustes: rmse 0.0007965385 max resid 0.01407932
## Run 18 stress 0.05741877
## Run 19 stress 0.05867473
## Run 20 stress 0.05660381
## ... Procrustes: rmse 0.0008098417 max resid 0.01409199
## *** Best solution repeated 2 times
## Square root transformation
## Wisconsin double standardization
## Run 0 stress 0.03100195
## Run 1 stress 0.03100886
## ... Procrustes: rmse 0.001860282 max resid 0.01964811
## Run 2 stress 0.03100655
## ... Procrustes: rmse 0.001809436 max resid 0.01502119
## Run 3 stress 0.03738253
## Run 4 stress 0.03122818
## ... Procrustes: rmse 0.007731228 max resid 0.04668352
## Run 5 stress 0.03101757
## ... Procrustes: rmse 0.002122981 max resid 0.01976892
## Run 6 stress 0.03142072
## ... Procrustes: rmse 0.003940614 max resid 0.02538769
## Run 7 stress 0.03113716
## ... Procrustes: rmse 0.004024748 max resid 0.02070315
## Run 8 stress 0.03783273
## Run 9 stress 0.0309996
## ... New best solution
## ... Procrustes: rmse 0.001577177 max resid 0.01490436
## Run 10 stress 0.03137454
## ... Procrustes: rmse 0.01834442 max resid 0.04944586
## Run 11 stress 0.0315096
## Run 12 stress 0.03139354
## ... Procrustes: rmse 0.01836707 max resid 0.04939437
## Run 13 stress 0.03115586
## ... Procrustes: rmse 0.00262926 max resid 0.01393858
## Run 14 stress 0.03100132
## ... Procrustes: rmse 0.0001637252 max resid 0.001250598
## ... Similar to previous best
## Run 15 stress 0.03157253
## Run 16 stress 0.03121361
## ... Procrustes: rmse 0.007626543 max resid 0.04481529
## Run 17 stress 0.03139407
## ... Procrustes: rmse 0.01836436 max resid 0.049624
## Run 18 stress 0.03102481
## ... Procrustes: rmse 0.0008602552 max resid 0.002865027
## ... Similar to previous best
## Run 19 stress 0.03100917
## ... Procrustes: rmse 0.001200415 max resid 0.01856019
## Run 20 stress 0.03123723
## ... Procrustes: rmse 0.004080291 max resid 0.01130745
## *** Best solution repeated 2 times
## Square root transformation
## Wisconsin double standardization
## Run 0 stress 0.02267932
## Run 1 stress 0.02285945
## ... Procrustes: rmse 0.005318605 max resid 0.05163845
## Run 2 stress 0.0230721
## ... Procrustes: rmse 0.01454088 max resid 0.05888222
## Run 3 stress 0.0228286
## ... Procrustes: rmse 0.006325837 max resid 0.03319399
## Run 4 stress 0.02309125
## ... Procrustes: rmse 0.009797497 max resid 0.05369451
## Run 5 stress 0.02349799
## Run 6 stress 0.02307173
## ... Procrustes: rmse 0.008578092 max resid 0.04838935
## Run 7 stress 0.02353592
## Run 8 stress 0.02272706
## ... Procrustes: rmse 0.004663163 max resid 0.05220668
## Run 9 stress 0.02319235
## Run 10 stress 0.0228929
## ... Procrustes: rmse 0.005067756 max resid 0.04936397
## Run 11 stress 0.0226904
## ... Procrustes: rmse 0.003821472 max resid 0.03055636
## Run 12 stress 0.02313611
## ... Procrustes: rmse 0.01495805 max resid 0.06110281
## Run 13 stress 0.02331595
## Run 14 stress 0.02314887
## ... Procrustes: rmse 0.01368614 max resid 0.04752004
## Run 15 stress 0.02379419
## Run 16 stress 0.02279778
## ... Procrustes: rmse 0.003705714 max resid 0.02500858
## Run 17 stress 0.02268476
## ... Procrustes: rmse 0.001134939 max resid 0.009911439
## ... Similar to previous best
## Run 18 stress 0.02280568
## ... Procrustes: rmse 0.002783335 max resid 0.02614025
## Run 19 stress 0.0230307
## ... Procrustes: rmse 0.01533704 max resid 0.05930438
## Run 20 stress 0.02352092
## *** Best solution repeated 1 times
## Square root transformation
## Wisconsin double standardization
## Run 0 stress 0.01745044
## Run 1 stress 0.01760697
## ... Procrustes: rmse 0.005431853 max resid 0.05534993
## Run 2 stress 0.01775461
## ... Procrustes: rmse 0.01184043 max resid 0.05826288
## Run 3 stress 0.01792373
## ... Procrustes: rmse 0.004385319 max resid 0.02640242
## Run 4 stress 0.01809884
## Run 5 stress 0.01930814
## Run 6 stress 0.01768445
## ... Procrustes: rmse 0.01193104 max resid 0.06252274
## Run 7 stress 0.01925184
## Run 8 stress 0.01817458
## Run 9 stress 0.01777304
## ... Procrustes: rmse 0.009759783 max resid 0.07249081
## Run 10 stress 0.01788223
## ... Procrustes: rmse 0.008599536 max resid 0.06154664
## Run 11 stress 0.01780884
## ... Procrustes: rmse 0.0128207 max resid 0.06227581
## Run 12 stress 0.01771042
## ... Procrustes: rmse 0.01202047 max resid 0.06367507
## Run 13 stress 0.01806552
## Run 14 stress 0.01819434
## Run 15 stress 0.01760785
## ... Procrustes: rmse 0.008155502 max resid 0.07146103
## Run 16 stress 0.01811591
## Run 17 stress 0.01782993
## ... Procrustes: rmse 0.01126684 max resid 0.06561554
## Run 18 stress 0.01773849
## ... Procrustes: rmse 0.01184637 max resid 0.05862861
## Run 19 stress 0.01769879
## ... Procrustes: rmse 0.01131141 max resid 0.05964816
## Run 20 stress 0.01778223
## ... Procrustes: rmse 0.01158636 max resid 0.05531058
## *** Best solution was not repeated -- monoMDS stopping criteria:
## 20: no. of iterations >= maxit
## Square root transformation
## Wisconsin double standardization
## Run 0 stress 0.01470232
## Run 1 stress 0.01426383
## ... New best solution
## ... Procrustes: rmse 0.0130729 max resid 0.08329273
## Run 2 stress 0.01416691
## ... New best solution
## ... Procrustes: rmse 0.005022937 max resid 0.03049654
## Run 3 stress 0.01452055
## ... Procrustes: rmse 0.01417726 max resid 0.08112892
## Run 4 stress 0.01467882
## Run 5 stress 0.01470182
## Run 6 stress 0.01472892
## Run 7 stress 0.01467751
## Run 8 stress 0.0141736
## ... Procrustes: rmse 0.01187823 max resid 0.0702224
## Run 9 stress 0.01436686
## ... Procrustes: rmse 0.01308783 max resid 0.08694689
## Run 10 stress 0.01536665
## Run 11 stress 0.01428692
## ... Procrustes: rmse 0.009589543 max resid 0.09133483
## Run 12 stress 0.01468029
## Run 13 stress 0.01437372
## ... Procrustes: rmse 0.01349171 max resid 0.06950188
## Run 14 stress 0.01489049
## Run 15 stress 0.01444754
## ... Procrustes: rmse 0.0124289 max resid 0.06838532
## Run 16 stress 0.01441679
## ... Procrustes: rmse 0.0124974 max resid 0.06475054
## Run 17 stress 0.01484764
## Run 18 stress 0.01436579
## ... Procrustes: rmse 0.01133914 max resid 0.04183535
## Run 19 stress 0.01578763
## Run 20 stress 0.01424384
## ... Procrustes: rmse 0.008622062 max resid 0.09263191
## *** Best solution was not repeated -- monoMDS stopping criteria:
## 20: no. of iterations >= maxit
## Square root transformation
## Wisconsin double standardization
## Run 0 stress 0.01159271
## Run 1 stress 0.01231995
## Run 2 stress 0.01231675
## Run 3 stress 0.0123291
## Run 4 stress 0.0129412
## Run 5 stress 0.01194928
## ... Procrustes: rmse 0.01346614 max resid 0.06435481
## Run 6 stress 0.01267622
## Run 7 stress 0.01223814
## Run 8 stress 0.01234434
## Run 9 stress 0.0122798
## Run 10 stress 0.01216852
## Run 11 stress 0.01219052
## Run 12 stress 0.01228166
## Run 13 stress 0.01232808
## Run 14 stress 0.01224609
## Run 15 stress 0.0125952
## Run 16 stress 0.01238599
## Run 17 stress 0.01234447
## Run 18 stress 0.01212564
## Run 19 stress 0.0120421
## ... Procrustes: rmse 0.007790014 max resid 0.03850533
## Run 20 stress 0.01213426
## *** Best solution was not repeated -- monoMDS stopping criteria:
## 20: no. of iterations >= maxit
## Square root transformation
## Wisconsin double standardization
## Run 0 stress 0.01006807
## Run 1 stress 0.0112219
## Run 2 stress 0.01055157
## ... Procrustes: rmse 0.01387935 max resid 0.09326126
## Run 3 stress 0.01064219
## Run 4 stress 0.01079995
## Run 5 stress 0.01042679
## ... Procrustes: rmse 0.009481296 max resid 0.0529539
## Run 6 stress 0.01116961
## Run 7 stress 0.01078218
## Run 8 stress 0.01082747
## Run 9 stress 0.01107196
## Run 10 stress 0.01126544
## Run 11 stress 0.01091599
## Run 12 stress 0.01065413
## Run 13 stress 0.01031789
## ... Procrustes: rmse 0.008644761 max resid 0.04281108
## Run 14 stress 0.01065736
## Run 15 stress 0.01096792
## Run 16 stress 0.01078167
## Run 17 stress 0.01089812
## Run 18 stress 0.01066146
## Run 19 stress 0.01062589
## Run 20 stress 0.01053113
## ... Procrustes: rmse 0.01075643 max resid 0.04501416
## *** Best solution was not repeated -- monoMDS stopping criteria:
## 20: no. of iterations >= maxit
## Square root transformation
## Wisconsin double standardization
## Run 0 stress 0.009111295
## Run 1 stress 0.009548517
## ... Procrustes: rmse 0.01094137 max resid 0.03988506
## Run 2 stress 0.009753849
## Run 3 stress 0.01021714
## Run 4 stress 0.009436933
## ... Procrustes: rmse 0.009662144 max resid 0.02805097
## Run 5 stress 0.009806361
## Run 6 stress 0.009444433
## ... Procrustes: rmse 0.01102185 max resid 0.05086678
## Run 7 stress 0.009686076
## Run 8 stress 0.009712265
## Run 9 stress 0.009614569
## Run 10 stress 0.009869996
## Run 11 stress 0.009745525
## Run 12 stress 0.009736319
## Run 13 stress 0.009690711
## Run 14 stress 0.009649312
## Run 15 stress 0.009351891
## ... Procrustes: rmse 0.008567265 max resid 0.03372346
## Run 16 stress 0.009980753
## Run 17 stress 0.00934312
## ... Procrustes: rmse 0.0113435 max resid 0.04504891
## Run 18 stress 0.009714768
## Run 19 stress 0.009600306
## ... Procrustes: rmse 0.01202499 max resid 0.05843118
## Run 20 stress 0.009611901
## *** Best solution was not repeated -- monoMDS stopping criteria:
## 20: no. of iterations >= maxit
plot(scree_out)
# k=3 looks great
try.NMDS <- metaMDS(abundance_matrix, distance = "bray", k = 3, autotransform = TRUE, trymax=100)
## Square root transformation
## Wisconsin double standardization
## Run 0 stress 0.05653678
## Run 1 stress 0.05811525
## Run 2 stress 0.05828934
## Run 3 stress 0.0569203
## ... Procrustes: rmse 0.01578648 max resid 0.0501326
## Run 4 stress 0.0567028
## ... Procrustes: rmse 0.01516076 max resid 0.04484438
## Run 5 stress 0.05833604
## Run 6 stress 0.05667964
## ... Procrustes: rmse 0.01513649 max resid 0.04494314
## Run 7 stress 0.05660351
## ... Procrustes: rmse 0.0008001714 max resid 0.01397935
## Run 8 stress 0.05828482
## Run 9 stress 0.05667946
## ... Procrustes: rmse 0.01513509 max resid 0.04488485
## Run 10 stress 0.05668324
## ... Procrustes: rmse 0.0151378 max resid 0.04487964
## Run 11 stress 0.05834023
## Run 12 stress 0.05725508
## Run 13 stress 0.05653688
## ... Procrustes: rmse 0.0001937895 max resid 0.001162355
## ... Similar to previous best
## Run 14 stress 0.0565416
## ... Procrustes: rmse 0.0004543333 max resid 0.004691431
## ... Similar to previous best
## Run 15 stress 0.05811077
## Run 16 stress 0.05653636
## ... New best solution
## ... Procrustes: rmse 0.0001097333 max resid 0.001258089
## ... Similar to previous best
## Run 17 stress 0.05653649
## ... Procrustes: rmse 9.910708e-05 max resid 0.0007607637
## ... Similar to previous best
## Run 18 stress 0.05828872
## Run 19 stress 0.05713269
## Run 20 stress 0.05689516
## ... Procrustes: rmse 0.01582619 max resid 0.05162991
## *** Best solution repeated 2 times
# check the stress value. It should be < 0.2, ideally even < 0.05. (But too low stress values can indicate too many 0 values)
try.NMDS$stress
## [1] 0.05653636
# let's get a general idea of what this NMDS is separating...
# plot the results for axis 1 & 2
ordiplot(try.NMDS, type = "n") # create blank ordination plot
orditorp(try.NMDS, display = "sites", cex = 0.5, air = 0.1) # add row numbers in black
orditorp(try.NMDS, display = "species", col="red", air = 0.1) # add species names in red
# plot the results for axis 1 & 3
ordiplot(try.NMDS, choices = c(1,3), type = "n") # create blank ordination plot
orditorp(try.NMDS, choices = c(1,3), display = "sites", cex = 0.5, air = 0.1) # add row numbers in black
orditorp(try.NMDS, choices = c(1,3), display = "species", col="red", air = 0.1) # add species names in red
# plot the results for axis 2 & 3
ordiplot(try.NMDS, choices = c(2,3), type = "n") # create blank ordination plot
orditorp(try.NMDS, choices = c(2,3), display = "sites", cex = 0.5, air = 0.1) # add row numbers in black
orditorp(try.NMDS, choices = c(2,3), display = "species", col="red", air = 0.1) # add species names in red
# we already know that presense/absence of protegens is consistently the most important thing for all communities so let's see if that shows up here.
# Let's switch over to ggplot to be certain that everything is labelled correctly.
# define a function (related to vegan) that finds coordinates for drawing a covariance ellipse
# CREDIT: THIS COMES FROM ONE OF THE TUTORIALS ABOVE!!!
veganCovEllipse <- function (cov, center = c(0, 0), scale = 1, npoints = 100) {
theta <- (0:npoints) * 2 * pi/npoints
Circle <- cbind(cos(theta), sin(theta))
t(center + scale * t(Circle %*% chol(cov)))
# finds the centroids and dispersion of the different ellipses based on a grouping factor of your choice
}
nmds_for_ggplot <- cbind(absDen_wide_forOrd[,1:8],
as.data.frame(scores(try.NMDS)$sites))
# create a new factor that defines the combination of heat and protegens
nmds_for_ggplot <- nmds_for_ggplot %>% unite("HeatxProtegens", c(Heat, protegens), remove = FALSE)
nmds_for_ggplot$HeatxProtegens <- factor(nmds_for_ggplot$HeatxProtegens,
levels = c("6_0", "6_1", "12_0", "12_1", "24_0", "24_1", "48_0", "48_1", "control_0", "control_1"))
# create empty dataframes to combine NMDS data with ellipse data
ellipse12_df <- ellipse13_df <- ellipse23_df <- data.frame() # numbers indicate the ordination axes
# adding data for ellipses, using HeatxProtegens as a grouping factor
for(g in levels(nmds_for_ggplot$HeatxProtegens)){
ellipse12_df <- rbind(ellipse12_df, cbind(as.data.frame(with(nmds_for_ggplot[nmds_for_ggplot$HeatxProtegens==g,],
veganCovEllipse(cov.wt(cbind(NMDS1, NMDS2),
wt=rep(1/length(NMDS1),length(NMDS1)))$cov,
center=c(mean(NMDS1),mean(NMDS2)))))
, HeatxProtegens=g))
ellipse13_df <- rbind(ellipse13_df, cbind(as.data.frame(with(nmds_for_ggplot[nmds_for_ggplot$HeatxProtegens==g,],
veganCovEllipse(cov.wt(cbind(NMDS1, NMDS3),
wt=rep(1/length(NMDS1),length(NMDS1)))$cov,
center=c(mean(NMDS1),mean(NMDS3)))))
, HeatxProtegens=g))
ellipse23_df <- rbind(ellipse23_df, cbind(as.data.frame(with(nmds_for_ggplot[nmds_for_ggplot$HeatxProtegens==g,],
veganCovEllipse(cov.wt(cbind(NMDS2, NMDS3),
wt=rep(1/length(NMDS2),length(NMDS2)))$cov,
center=c(mean(NMDS2),mean(NMDS3)))))
, HeatxProtegens=g))
}
# now we separate the HeatxProtegens columns:
ellipse12_df <- ellipse12_df %>% separate(HeatxProtegens, c("Heat", "protegens"))
ellipse12_df$Heat <- factor(ellipse12_df$Heat, levels = levels(nmds_for_ggplot$Heat))
ellipse13_df <- ellipse13_df %>% separate(HeatxProtegens, c("Heat", "protegens"))
ellipse13_df$Heat <- factor(ellipse13_df$Heat, levels = levels(nmds_for_ggplot$Heat))
ellipse23_df <- ellipse23_df %>% separate(HeatxProtegens, c("Heat", "protegens"))
ellipse23_df$Heat <- factor(ellipse23_df$Heat, levels = levels(nmds_for_ggplot$Heat))
nmds_for_ggplot$protegens <- as.character(nmds_for_ggplot$protegens) # this needs to be discrete (could also be a factor)
# and finally we can make the plots:
ggplot(data = nmds_for_ggplot, aes(NMDS1, NMDS2)) +
geom_point(aes(color = Heat, shape = protegens), alpha=0.4) + # adding different colours and shapes for points at different distances
geom_path(data=ellipse12_df, aes(x=NMDS1, y=NMDS2, colour=Heat, linetype=protegens), linewidth=1) + # adding covariance ellipses according to distance # use size argument if ggplot2 < v. 3.4.0
guides(color = guide_legend(override.aes = list(linetype=rep(NA,5)))) + # removes lines from colour part of the legend
scale_colour_viridis_d(option = "plasma", begin=0.05, end = 0.9) +
fave_theme + # not sure why I need this but I do to over-write the default grey theme
labs(title="NMDS of all data (4sp & 3 time-points)")
# axes 1 & 2 again showing just the ellipses (bc it's hard to see protegens effects as it's so overlapped)
ggplot(data = nmds_for_ggplot, aes(NMDS1, NMDS2)) +
geom_path(data=ellipse12_df, aes(x=NMDS1, y=NMDS2, colour=Heat, linetype=protegens), linewidth=1) + # plot just the ellipses
scale_colour_viridis_d(option = "plasma", begin=0.05, end = 0.9) +
fave_theme + # not sure why I need this but I do to over-write the default grey theme
labs(title="NMDS of all data (4sp & 3 time-points)")
ggplot(data = nmds_for_ggplot, aes(NMDS1, NMDS3)) +
geom_point(aes(color = Heat, shape = protegens), alpha=0.4) +
geom_path(data=ellipse13_df, aes(x=NMDS1, y=NMDS3, colour=Heat, linetype=protegens), linewidth=1) +
guides(color = guide_legend(override.aes = list(linetype=rep(NA,5)))) +
scale_colour_viridis_d(option = "plasma", begin=0.05, end = 0.9) +
fave_theme +
labs(title="NMDS of all data (4sp & 3 time-points)")
# axes 1 & 3 again showing just the ellipses
ggplot(data = nmds_for_ggplot, aes(NMDS1, NMDS3)) +
geom_path(data=ellipse13_df, aes(x=NMDS1, y=NMDS3, colour=Heat, linetype=protegens), linewidth=1) +
scale_colour_viridis_d(option = "plasma", begin=0.05, end = 0.9) +
fave_theme +
labs(title="NMDS of all data (4sp & 3 time-points)")
ggplot(data = nmds_for_ggplot, aes(NMDS2, NMDS3)) +
geom_point(aes(color = Heat, shape = protegens), alpha=0.4) +
geom_path(data=ellipse23_df, aes(x=NMDS2, y=NMDS3, colour=Heat, linetype=protegens), linewidth=1) +
guides(color = guide_legend(override.aes = list(linetype=rep(NA,5)))) +
scale_colour_viridis_d(option = "plasma", begin=0.05, end = 0.9) +
fave_theme +
labs(title="NMDS of all data (4sp & 3 time-points)")
# axes 2 & 3 again showing just the ellipses (bc it's hard to see protegens effects as it's so overlapped)
ggplot(data = nmds_for_ggplot, aes(NMDS2, NMDS3)) +
geom_path(data=ellipse23_df, aes(x=NMDS2, y=NMDS3, colour=Heat, linetype=protegens), linewidth=1) +
scale_colour_viridis_d(option = "plasma", begin=0.05, end = 0.9) +
fave_theme +
labs(title="NMDS of all data (4sp & 3 time-points)")
################
# check significance:
# using a PERMANOVA to test the differences in community composition
# This is a PERmutational Multivariate ANalysis Of VAriance and tests the differences between groups, like an ANOVA, but with lots of variables.
# it is essentially a multivariate analysis of variance used to compare groups of objects
nmdsdata_test_Heat <- adonis2(abundance_matrix ~ Heat, absDen_wide_forOrd,
permutations = 999, method = "bray")
print(nmdsdata_test_Heat)
## Permutation test for adonis under reduced model
## Permutation: free
## Number of permutations: 999
##
## adonis2(formula = abundance_matrix ~ Heat, data = absDen_wide_forOrd, permutations = 999, method = "bray")
## Df SumOfSqs R2 F Pr(>F)
## Model 4 8.188 0.07534 6.4981 0.001 ***
## Residual 319 100.485 0.92466
## Total 323 108.673 1.00000
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
nmdsdata_test_Prot <- adonis2(abundance_matrix ~ protegens, absDen_wide_forOrd,
permutations = 999, method = "bray")
print(nmdsdata_test_Prot)
## Permutation test for adonis under reduced model
## Permutation: free
## Number of permutations: 999
##
## adonis2(formula = abundance_matrix ~ protegens, data = absDen_wide_forOrd, permutations = 999, method = "bray")
## Df SumOfSqs R2 F Pr(>F)
## Model 1 45.211 0.41603 229.39 0.001 ***
## Residual 322 63.462 0.58397
## Total 323 108.673 1.00000
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
nmdsdata_test_HeatxProt <- adonis2(abundance_matrix ~ Heat * protegens, absDen_wide_forOrd,
permutations = 999, method = "bray")
print(nmdsdata_test_HeatxProt)
## Permutation test for adonis under reduced model
## Permutation: free
## Number of permutations: 999
##
## adonis2(formula = abundance_matrix ~ Heat * protegens, data = absDen_wide_forOrd, permutations = 999, method = "bray")
## Df SumOfSqs R2 F Pr(>F)
## Model 9 60.966 0.56101 44.586 0.001 ***
## Residual 314 47.706 0.43899
## Total 323 108.673 1.00000
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# so these are all significant but is that spurious because the dispersion is different btw groups? (e.g., much smaller for protegens)
##############
# check PERMANOVA assumption of homogeneous group variances
# Bray-curtis distance matrix
dist_mat <- vegdist(abundance_matrix, method = "bray")
# use betadisper test to check for multivariate homogeneity of group variances
dispersion <- betadisper(dist_mat, group = paste(absDen_wide_forOrd$Heat, absDen_wide_forOrd$protegens))
## Warning in betadisper(dist_mat, group = paste(absDen_wide_forOrd$Heat,
## absDen_wide_forOrd$protegens)): some squared distances are negative and changed
## to zero
permutest(dispersion)
##
## Permutation test for homogeneity of multivariate dispersions
## Permutation: free
## Number of permutations: 999
##
## Response: Distances
## Df Sum Sq Mean Sq F N.Perm Pr(>F)
## Groups 9 8.7974 0.97749 23.077 999 0.001 ***
## Residuals 314 13.3006 0.04236
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# yeap! We need to try a different test that is robust to heterogenous group variances...
################
# check significance:
# let's test for significance again using ANOSIM (which is another non-parametric test but this time only considering the ranks)
nmdsdata_test2_HeatxProt <- anosim(dist_mat,
grouping = paste(absDen_wide_forOrd$Heat, absDen_wide_forOrd$protegens),
permutations = 999)
plot(nmdsdata_test2_HeatxProt)
## Warning in (function (z, notch = FALSE, width = NULL, varwidth = FALSE, : some
## notches went outside hinges ('box'): maybe set notch=FALSE
summary(nmdsdata_test2_HeatxProt)
##
## Call:
## anosim(x = dist_mat, grouping = paste(absDen_wide_forOrd$Heat, absDen_wide_forOrd$protegens), permutations = 999)
## Dissimilarity: bray
##
## ANOSIM statistic R: 0.6212
## Significance: 0.001
##
## Permutation: free
## Number of permutations: 999
##
## Upper quantiles of permutations (null model):
## 90% 95% 97.5% 99%
## 0.0110 0.0150 0.0173 0.0204
##
## Dissimilarity ranks between and within classes:
## 0% 25% 50% 75% 100% N
## Between 15.5 15656.00 28220.0 44062.5 44062.5 47065
## 12 0 156.0 4841.00 22156.0 25514.0 44062.5 253
## 12 1 37.0 2073.25 6635.0 10519.0 18965.0 780
## 24 0 15.5 6506.00 22860.0 44062.5 44062.5 465
## 24 1 102.0 3052.00 5746.0 9735.5 17602.0 595
## 48 0 15.5 15176.00 26611.5 44062.5 44062.5 406
## 48 1 33.0 1305.00 3454.0 8128.5 20766.0 595
## 6 0 88.0 6343.00 21367.0 44062.5 44062.5 561
## 6 1 35.0 1550.50 3749.0 6333.5 19718.0 780
## control 0 685.0 8820.00 20181.0 20994.5 44062.5 231
## control 1 32.0 729.00 1856.0 4199.5 17322.0 595
Great!, This summarizes the same result that I found with the other indices: presence of P.protegens is the most important thing. Communities where this species was present look quite similar across different heat treatments. Longer heat durations push the communities toward different direction, until a threshold is reached at the longest heat treatment (48h).
The NMDS ordination results are significant by PERMANOVA but the assumptions of that test might be violated because the dispersal is heterogeneous between groups. I think ANOSIM should be somewhat more robust to this problem because it uses ranks. The NMDS ordination results are significant by ANOSIM.
################################
# Plot figure for main text: Figure 3b
################################
# change protegens values for better plotting
nmds_for_ggplot$P_protegens <- "absent"
nmds_for_ggplot$P_protegens[nmds_for_ggplot$protegens == 1] <- "present"
ellipse12_df$P_protegens <- "absent"
ellipse12_df$P_protegens[ellipse12_df$protegens == 1] <- "present"
ellipse13_df$P_protegens <- "absent"
ellipse13_df$P_protegens[ellipse13_df$protegens == 1] <- "present"
# change Heat values for better plotting
levels(nmds_for_ggplot$Heat)[2:5] <- paste(levels(nmds_for_ggplot$Heat)[2:5], "hrs")
levels(ellipse12_df$Heat)[2:5] <- paste(levels(ellipse12_df$Heat)[2:5], "hrs")
levels(ellipse13_df$Heat)[2:5] <- paste(levels(ellipse13_df$Heat)[2:5], "hrs")
# create the plot of 1 vs 2:
plot1_2 <- ggplot(data = nmds_for_ggplot, aes(NMDS1, NMDS2)) +
geom_point(aes(color = Heat, shape = P_protegens), size=2, alpha=0.4) + # adding different colours and shapes for points at different distances
geom_path(data=ellipse12_df, aes(x=NMDS1, y=NMDS2, colour=Heat, linetype=P_protegens), linewidth=1) + # adding covariance ellipses according to distance # use size argument if ggplot2 < v. 3.4.0
guides(color = guide_legend(override.aes = list(linetype=rep(NA,5),# removes lines from colour part of the legend
alpha=1, size=3)), # make the points opaque and bigger in the colour part of the legend
shape = guide_legend(override.aes = list(size=3))) + # make the points bigger in the greyscale part of the legend
scale_colour_viridis_d(option = "plasma", begin=0.05, end = 0.9)
# plot 1 vs 2 with the legend ... I will extract the legend from here
png(filename="./figures/Fig3_A_legend.png", width = 3.48, height = 3.41, units = "in", res=300)
print(plot1_2)
dev.off()
## png
## 2
# plot 1 vs 2 without the legend
png(filename="./figures/Fig3_A_axis1vs2.png", width = 5.35, height = 3.78, units = "in", res=300)
print(plot1_2 + theme(legend.position="none"))
dev.off()
## png
## 2
# plot 1 vs 3 without the legend
png(filename="./figures/Fig3_A_axis1vs3.png", width = 5.35, height = 3.78, units = "in", res=300)
ggplot(data = nmds_for_ggplot, aes(NMDS1, NMDS3)) +
geom_point(aes(color = Heat, shape = P_protegens), size=2, alpha=0.4) +
geom_path(data=ellipse13_df, aes(x=NMDS1, y=NMDS3, colour=Heat, linetype=P_protegens), linewidth=1) +
scale_colour_viridis_d(option = "plasma", begin=0.05, end = 0.9) +
theme(legend.position="none")
dev.off()
## png
## 2
# let's check if the gradient of heat pulse duration is significant:
absDen_wide_forOrd$Heat <- as.character(levels(absDen_wide_forOrd$Heat))[absDen_wide_forOrd$Heat]
absDen_wide_forOrd$Heat[absDen_wide_forOrd$Heat == "control"] <- 0
absDen_wide_forOrd$Heat <- as.numeric(absDen_wide_forOrd$Heat)
# let's see what the heat gradient looks like
gg_ordiplot(try.NMDS, groups = absDen_wide_forOrd$protegens, plot = TRUE)
# here's another way to do it:
gg_envfit(try.NMDS, env = absDen_wide_forOrd$Heat, groups = absDen_wide_forOrd$protegens, plot = TRUE, alpha=0.5) # notice this gradient is not significant!!!
gg_envfit(try.NMDS, env = absDen_wide_forOrd$Heat, groups = absDen_wide_forOrd$protegens, plot = TRUE, alpha=0.5, choices=c(1,3)) # notice this gradient is not significant!!!
gg_envfit(try.NMDS, env = absDen_wide_forOrd$Heat, groups = absDen_wide_forOrd$protegens, plot = TRUE, alpha=0.5, choices=c(2,3)) # notice this gradient is not significant!!!
# display the p-value:
gg_envfit(try.NMDS, env = absDen_wide_forOrd$Heat, groups = absDen_wide_forOrd$protegens, alpha=0.5, plot = FALSE)$df_arrows$p.val
## [1] 0.295
The gradient is NOT significant.
Let’s re-do the analysis this time removing the 48h duration in order to check if the gradient becomes significant.
# exclude 48h duration data
absDen_wide_forOrd_no48 <- absDen_wide_forOrd %>% filter(Heat < 48)
abundance_mat_no48 <- as.matrix(absDen_wide_forOrd_no48[,9:20])
# re-do the NMDS with k=3
NMDS_no48 <- metaMDS(abundance_mat_no48, distance = "bray", k = 3, autotransform = TRUE, trymax=100)
## Square root transformation
## Wisconsin double standardization
## Run 0 stress 0.03610698
## Run 1 stress 0.0361706
## ... Procrustes: rmse 0.004523705 max resid 0.0209621
## Run 2 stress 0.03587647
## ... New best solution
## ... Procrustes: rmse 0.008280525 max resid 0.03289523
## Run 3 stress 0.03586422
## ... New best solution
## ... Procrustes: rmse 0.009772168 max resid 0.03282395
## Run 4 stress 0.0361075
## ... Procrustes: rmse 0.0049532 max resid 0.02134297
## Run 5 stress 0.0361076
## ... Procrustes: rmse 0.004952867 max resid 0.02135668
## Run 6 stress 0.03632809
## ... Procrustes: rmse 0.009324124 max resid 0.03207945
## Run 7 stress 0.03792442
## Run 8 stress 0.03587761
## ... Procrustes: rmse 0.009770886 max resid 0.03279242
## Run 9 stress 0.0361707
## ... Procrustes: rmse 0.001574116 max resid 0.02088598
## Run 10 stress 0.04846144
## Run 11 stress 0.03610699
## ... Procrustes: rmse 0.004969727 max resid 0.02132521
## Run 12 stress 0.03624383
## ... Procrustes: rmse 0.008163905 max resid 0.03274627
## Run 13 stress 0.03656112
## Run 14 stress 0.04613172
## Run 15 stress 0.0363021
## ... Procrustes: rmse 0.00152127 max resid 0.02025857
## Run 16 stress 0.03610761
## ... Procrustes: rmse 0.004952837 max resid 0.02135751
## Run 17 stress 0.03617057
## ... Procrustes: rmse 0.001576836 max resid 0.02086885
## Run 18 stress 0.03585721
## ... New best solution
## ... Procrustes: rmse 0.005268631 max resid 0.02116026
## Run 19 stress 0.03595363
## ... Procrustes: rmse 0.00973805 max resid 0.03298728
## Run 20 stress 0.03775638
## Run 21 stress 0.03776485
## Run 22 stress 0.03587669
## ... Procrustes: rmse 0.008254363 max resid 0.03305972
## Run 23 stress 0.04624875
## Run 24 stress 0.03587662
## ... Procrustes: rmse 0.008228895 max resid 0.03289349
## Run 25 stress 0.03607472
## ... Procrustes: rmse 0.004978904 max resid 0.02099788
## Run 26 stress 0.03607386
## ... Procrustes: rmse 0.004984494 max resid 0.02107825
## Run 27 stress 0.03617053
## ... Procrustes: rmse 0.004835244 max resid 0.02103437
## Run 28 stress 0.03820177
## Run 29 stress 0.03607535
## ... Procrustes: rmse 0.004969459 max resid 0.02114723
## Run 30 stress 0.04853968
## Run 31 stress 0.03784573
## Run 32 stress 0.03607467
## ... Procrustes: rmse 0.004986658 max resid 0.02111265
## Run 33 stress 0.03586417
## ... Procrustes: rmse 0.005263557 max resid 0.02144332
## Run 34 stress 0.0363281
## ... Procrustes: rmse 0.008242741 max resid 0.03229225
## Run 35 stress 0.03595435
## ... Procrustes: rmse 0.009738699 max resid 0.03296331
## Run 36 stress 0.03645808
## Run 37 stress 0.03595357
## ... Procrustes: rmse 0.009745052 max resid 0.03299968
## Run 38 stress 0.03610661
## ... Procrustes: rmse 0.00135609 max resid 0.01935981
## Run 39 stress 0.0363037
## ... Procrustes: rmse 0.004650823 max resid 0.02069567
## Run 40 stress 0.03587627
## ... Procrustes: rmse 0.008237771 max resid 0.03288605
## Run 41 stress 0.03624235
## ... Procrustes: rmse 0.009515786 max resid 0.03278193
## Run 42 stress 0.03609896
## ... Procrustes: rmse 0.008214127 max resid 0.03256114
## Run 43 stress 0.03595322
## ... Procrustes: rmse 0.009745371 max resid 0.03298626
## Run 44 stress 0.03610723
## ... Procrustes: rmse 0.001331733 max resid 0.01934939
## Run 45 stress 0.03587669
## ... Procrustes: rmse 0.008254437 max resid 0.03306022
## Run 46 stress 0.03617061
## ... Procrustes: rmse 0.004865983 max resid 0.0210081
## Run 47 stress 0.03586504
## ... Procrustes: rmse 0.005265935 max resid 0.02146238
## Run 48 stress 0.03617104
## ... Procrustes: rmse 0.004845602 max resid 0.02091001
## Run 49 stress 0.03586405
## ... Procrustes: rmse 0.005264017 max resid 0.0214265
## Run 50 stress 0.04703046
## Run 51 stress 0.0362726
## ... Procrustes: rmse 0.009413914 max resid 0.03248227
## Run 52 stress 0.03607465
## ... Procrustes: rmse 0.004966884 max resid 0.02113944
## Run 53 stress 0.03610716
## ... Procrustes: rmse 0.001328212 max resid 0.0193493
## Run 54 stress 0.03586437
## ... Procrustes: rmse 0.00527586 max resid 0.02152297
## Run 55 stress 0.03631954
## ... Procrustes: rmse 0.005103146 max resid 0.02342217
## Run 56 stress 0.05410207
## Run 57 stress 0.03586391
## ... Procrustes: rmse 0.005277148 max resid 0.02147091
## Run 58 stress 0.03607545
## ... Procrustes: rmse 0.0049688 max resid 0.02116238
## Run 59 stress 0.03586361
## ... Procrustes: rmse 0.005278219 max resid 0.02142871
## Run 60 stress 0.03608585
## ... Procrustes: rmse 0.008242746 max resid 0.03287899
## Run 61 stress 0.03587691
## ... Procrustes: rmse 0.008255639 max resid 0.03306867
## Run 62 stress 0.0360749
## ... Procrustes: rmse 0.004984373 max resid 0.02114165
## Run 63 stress 0.03586413
## ... Procrustes: rmse 0.005263647 max resid 0.02143811
## Run 64 stress 0.03610729
## ... Procrustes: rmse 0.001402614 max resid 0.01934451
## Run 65 stress 0.03610706
## ... Procrustes: rmse 0.00139432 max resid 0.01934721
## Run 66 stress 0.03667862
## Run 67 stress 0.03827359
## Run 68 stress 0.03585661
## ... New best solution
## ... Procrustes: rmse 0.0001460725 max resid 0.0004428963
## ... Similar to previous best
## *** Best solution repeated 1 times
# check the stress value. For the data with 48h it was 0.05653636
NMDS_no48$stress # it's smaller than before
## [1] 0.03585661
# let's get a general idea of what this NMDS is separating...
# plot the results for axis 1 & 2
ordiplot(NMDS_no48, type = "n") # create blank ordination plot
orditorp(NMDS_no48, display = "sites", cex = 0.5, air = 0.1) # add row numbers in black
orditorp(NMDS_no48, display = "species", col="red", air = 0.1) # add species names in red
# plot the results for axis 1 & 3
ordiplot(NMDS_no48, choices = c(1,3), type = "n") # create blank ordination plot
orditorp(NMDS_no48, choices = c(1,3), display = "sites", cex = 0.5, air = 0.1) # add row numbers in black
orditorp(NMDS_no48, choices = c(1,3), display = "species", col="red", air = 0.1) # add species names in red
# plot the results for axis 2 & 3
ordiplot(NMDS_no48, choices = c(2,3), type = "n") # create blank ordination plot
orditorp(NMDS_no48, choices = c(2,3), display = "sites", cex = 0.5, air = 0.1) # add row numbers in black
orditorp(NMDS_no48, choices = c(2,3), display = "species", col="red", air = 0.1) # add species names in red
# test for significance
gg_envfit(NMDS_no48, env = absDen_wide_forOrd_no48$Heat, groups = absDen_wide_forOrd_no48$protegens, plot = TRUE, alpha=0.5)
gg_envfit(NMDS_no48, env = absDen_wide_forOrd_no48$Heat, groups = absDen_wide_forOrd_no48$protegens, plot = TRUE, alpha=0.5)$df_arrows$p.val
## [1] 0.224
The NMDS looks similar to when the 48h heat pulse data was included. But it’s not exactly the same, as is to be expected. The gradient is NOT significant when we remove all of the 48h heat pulse duration data.
# exclude 48h duration data
absDen_wide_forOrd_no48 <- rbind(absDen_wide_forOrd %>% filter(Heat < 48),
absDen_wide_forOrd %>% filter(Heat == 48, protegens == 1))
abundance_mat_no48 <- as.matrix(absDen_wide_forOrd_no48[,9:20])
# re-do the NMDS with k=3
NMDS_no48 <- metaMDS(abundance_mat_no48, distance = "bray", k = 3, autotransform = TRUE, trymax=100)
## Square root transformation
## Wisconsin double standardization
## Run 0 stress 0.03706032
## Run 1 stress 0.04547765
## Run 2 stress 0.05064102
## Run 3 stress 0.03716541
## ... Procrustes: rmse 0.01666652 max resid 0.05646784
## Run 4 stress 0.03703048
## ... New best solution
## ... Procrustes: rmse 0.01560765 max resid 0.04445792
## Run 5 stress 0.03703069
## ... Procrustes: rmse 0.0002904334 max resid 0.004232298
## ... Similar to previous best
## Run 6 stress 0.03719355
## ... Procrustes: rmse 0.0008520842 max resid 0.01321004
## Run 7 stress 0.03716485
## ... Procrustes: rmse 0.006032667 max resid 0.03788253
## Run 8 stress 0.03719313
## ... Procrustes: rmse 0.0007887533 max resid 0.01320485
## Run 9 stress 0.03707538
## ... Procrustes: rmse 0.01557246 max resid 0.04406662
## Run 10 stress 0.05222347
## Run 11 stress 0.03719385
## ... Procrustes: rmse 0.0007832657 max resid 0.01322475
## Run 12 stress 0.0370302
## ... New best solution
## ... Procrustes: rmse 0.0003310697 max resid 0.004292492
## ... Similar to previous best
## Run 13 stress 0.03758154
## Run 14 stress 0.03706025
## ... Procrustes: rmse 0.01560889 max resid 0.04371931
## Run 15 stress 0.03706038
## ... Procrustes: rmse 0.01560757 max resid 0.0434899
## Run 16 stress 0.03738869
## ... Procrustes: rmse 0.005864962 max resid 0.03818162
## Run 17 stress 0.03716565
## ... Procrustes: rmse 0.006034456 max resid 0.03822598
## Run 18 stress 0.05054459
## Run 19 stress 0.0372091
## ... Procrustes: rmse 0.01559203 max resid 0.0438836
## Run 20 stress 0.04326073
## *** Best solution repeated 1 times
# check the stress value. For the data with 48h it was 0.05653636
NMDS_no48$stress # it's smaller than before
## [1] 0.0370302
# let's get a general idea of what this NMDS is separating...
# plot the results for axis 1 & 2
ordiplot(NMDS_no48, type = "n") # create blank ordination plot
orditorp(NMDS_no48, display = "sites", cex = 0.5, air = 0.1) # add row numbers in black
orditorp(NMDS_no48, display = "species", col="red", air = 0.1) # add species names in red
# plot the results for axis 1 & 3
ordiplot(NMDS_no48, choices = c(1,3), type = "n") # create blank ordination plot
orditorp(NMDS_no48, choices = c(1,3), display = "sites", cex = 0.5, air = 0.1) # add row numbers in black
orditorp(NMDS_no48, choices = c(1,3), display = "species", col="red", air = 0.1) # add species names in red
# plot the results for axis 2 & 3
ordiplot(NMDS_no48, choices = c(2,3), type = "n") # create blank ordination plot
orditorp(NMDS_no48, choices = c(2,3), display = "sites", cex = 0.5, air = 0.1) # add row numbers in black
orditorp(NMDS_no48, choices = c(2,3), display = "species", col="red", air = 0.1) # add species names in red
# test for significance of environmental vector:
gg_envfit(NMDS_no48, env = absDen_wide_forOrd_no48$Heat, groups = absDen_wide_forOrd_no48$protegens, plot = TRUE)
gg_envfit(NMDS_no48, env = absDen_wide_forOrd_no48$Heat, groups = absDen_wide_forOrd_no48$protegens, plot = TRUE, choices=c(1,3))
gg_envfit(NMDS_no48, env = absDen_wide_forOrd_no48$Heat, groups = absDen_wide_forOrd_no48$protegens, plot = TRUE, choices=c(2,3))
# print out the p-value
gg_envfit(NMDS_no48, env = absDen_wide_forOrd_no48$Heat, groups = absDen_wide_forOrd_no48$protegens, plot = FALSE, alpha=0.5)$df_arrows$p.val
## [1] 0.001
# okay now that we see this is significant, let's make a pretty plot of the NMDS to include in the supplement
# first we have to return Heat to a factor with appropriate levels
absDen_wide_forOrd_no48$Heat <- factor(absDen_wide_forOrd_no48$Heat,
levels = c(0, 6, 12, 24, 48))
levels(absDen_wide_forOrd_no48$Heat)[1] <- "control"
nmds_for_ggplot <- cbind(absDen_wide_forOrd_no48[,1:8],
as.data.frame(scores(NMDS_no48)$sites))
# create a new factor that defines the combination of heat and protegens
nmds_for_ggplot <- nmds_for_ggplot %>% unite("HeatxProtegens", c(Heat, protegens), remove = FALSE)
nmds_for_ggplot$HeatxProtegens <- factor(nmds_for_ggplot$HeatxProtegens,
levels = c("6_0", "6_1", "12_0", "12_1", "24_0", "24_1", "48_1", "control_0", "control_1"))
# create empty dataframes to combine NMDS data with ellipse data
ellipse12_df <- ellipse13_df <- ellipse23_df <- data.frame() # numbers indicate the ordination axes
# adding data for ellipses, using HeatxProtegens as a grouping factor
for(g in levels(nmds_for_ggplot$HeatxProtegens)){
ellipse12_df <- rbind(ellipse12_df, cbind(as.data.frame(with(nmds_for_ggplot[nmds_for_ggplot$HeatxProtegens==g,],
veganCovEllipse(cov.wt(cbind(NMDS1, NMDS2),
wt=rep(1/length(NMDS1),length(NMDS1)))$cov,
center=c(mean(NMDS1),mean(NMDS2)))))
, HeatxProtegens=g))
ellipse13_df <- rbind(ellipse13_df, cbind(as.data.frame(with(nmds_for_ggplot[nmds_for_ggplot$HeatxProtegens==g,],
veganCovEllipse(cov.wt(cbind(NMDS1, NMDS3),
wt=rep(1/length(NMDS1),length(NMDS1)))$cov,
center=c(mean(NMDS1),mean(NMDS3)))))
, HeatxProtegens=g))
ellipse23_df <- rbind(ellipse23_df, cbind(as.data.frame(with(nmds_for_ggplot[nmds_for_ggplot$HeatxProtegens==g,],
veganCovEllipse(cov.wt(cbind(NMDS2, NMDS3),
wt=rep(1/length(NMDS2),length(NMDS2)))$cov,
center=c(mean(NMDS2),mean(NMDS3)))))
, HeatxProtegens=g))
}
# now we separate the HeatxProtegens columns:
ellipse12_df <- ellipse12_df %>% separate(HeatxProtegens, c("Heat", "protegens"))
ellipse12_df$Heat <- factor(ellipse12_df$Heat,
levels(nmds_for_ggplot$Heat))
ellipse13_df <- ellipse13_df %>% separate(HeatxProtegens, c("Heat", "protegens"))
ellipse13_df$Heat <- factor(ellipse13_df$Heat, levels = levels(nmds_for_ggplot$Heat))
ellipse23_df <- ellipse23_df %>% separate(HeatxProtegens, c("Heat", "protegens"))
ellipse23_df$Heat <- factor(ellipse23_df$Heat, levels = levels(nmds_for_ggplot$Heat))
nmds_for_ggplot$protegens <- as.character(nmds_for_ggplot$protegens) # this needs to be discrete (could also be a factor)
# and finally we can make the plots:
ggplot(data = nmds_for_ggplot, aes(NMDS1, NMDS2)) +
geom_point(aes(color = Heat, shape = protegens), alpha=0.4) + # adding different colours and shapes for points at different distances
geom_path(data=ellipse12_df, aes(x=NMDS1, y=NMDS2, colour=Heat, linetype=protegens), linewidth=1) + # adding covariance ellipses according to distance # use size argument if ggplot2 < v. 3.4.0
guides(color = guide_legend(override.aes = list(linetype=rep(NA,5)))) + # removes lines from colour part of the legend
scale_colour_viridis_d(option = "plasma", begin=0.05, end = 0.9) +
fave_theme + # not sure why I need this but I do to over-write the default grey theme
labs(title="exlude: 48h without protegens")
# axis 1 vs 3
ggplot(data = nmds_for_ggplot, aes(NMDS1, NMDS3)) +
geom_point(aes(color = Heat, shape = protegens), alpha=0.4) +
geom_path(data=ellipse13_df, aes(x=NMDS1, y=NMDS3, colour=Heat, linetype=protegens), linewidth=1) +
guides(color = guide_legend(override.aes = list(linetype=rep(NA,5)))) +
scale_colour_viridis_d(option = "plasma", begin=0.05, end = 0.9) +
fave_theme +
labs(title="exlude: 48h without protegens")
# axis 2 vs 3
ggplot(data = nmds_for_ggplot, aes(NMDS2, NMDS3)) +
geom_point(aes(color = Heat, shape = protegens), alpha=0.4) +
geom_path(data=ellipse23_df, aes(x=NMDS2, y=NMDS3, colour=Heat, linetype=protegens), linewidth=1) +
guides(color = guide_legend(override.aes = list(linetype=rep(NA,5)))) +
scale_colour_viridis_d(option = "plasma", begin=0.05, end = 0.9) +
fave_theme +
labs(title="exlude: 48h without protegens")
# clean up
rm(abundance_matrix, scree_out, try.NMDS, nmds_for_ggplot, ellipse12_df, ellipse13_df, ellipse23_df, nmdsdata_test_Heat, nmdsdata_test_Prot, nmdsdata_test_HeatxProt, dist_mat, dispersion, nmdsdata_test2_HeatxProt, absDen_forOrd, absDen_wide_forOrd, g, plot1_2, absDen_wide_forOrd_no48, abundance_mat_no48, NMDS_no48)
Let’s summarize the main result that P. protegens dominates all communities where it was inoculated. The species richness is conceptually a good metric for this … but recall that the flow cytometry data has a some rate of misclassification (in some cases as much as 20% !!!). So we need to use richness estimates that take into account the proportion of species and are more likely to ignore rare species.
species_div.df <- absDen_forFit %>% mutate(relden_putida = Conc_putida/Total_density,
relden_protegens = Conc_protegens/Total_density,
relden_grimontii = Conc_grimontii/Total_density,
relden_veronii = Conc_veronii/Total_density)
species_div.df <- species_div.df %>% mutate(HillEven_q0 = unlist(calcDiv(sampleData = species_div.df[,c("relden_putida","relden_protegens","relden_grimontii","relden_veronii")],
type = "HillEven",
q=0)),
HillEven_q1 = unlist(calcDiv(sampleData = species_div.df[,c("relden_putida","relden_protegens","relden_grimontii","relden_veronii")],
type = "HillEven",
q=1)),
HillEven_q2 = unlist(calcDiv(sampleData = species_div.df[,c("relden_putida","relden_protegens","relden_grimontii","relden_veronii")],
type = "HillEven",
q=2)),
HillDiv_q1 = unlist(calcDiv(sampleData = species_div.df[,c("relden_putida","relden_protegens","relden_grimontii","relden_veronii")],
type = "HillDiv",
q=1)),
HillDiv_q2 = unlist(calcDiv(sampleData = species_div.df[,c("relden_putida","relden_protegens","relden_grimontii","relden_veronii")],
type = "HillDiv",
q=2)))
ggplot(species_div.df,
aes(y=HillEven_q0, x=Day, colour=as.factor(CommRich))) +
facet_grid(protegens~as.factor(Heat)) +
geom_point(alpha=0.5) +
scale_colour_viridis_d(option = "viridis", begin=0.1)
## Warning: Removed 110 rows containing missing values or values outside the scale range
## (`geom_point()`).
ggplot(species_div.df,
aes(y=HillEven_q1, x=Day, colour=as.factor(CommRich))) +
facet_grid(protegens~as.factor(Heat)) +
geom_point(alpha=0.5) +
scale_colour_viridis_d(option = "viridis", begin=0.1)
## Warning: Removed 110 rows containing missing values or values outside the scale range
## (`geom_point()`).
ggplot(species_div.df,
aes(y=HillEven_q2, x=Day, colour=as.factor(CommRich))) +
facet_grid(protegens~as.factor(Heat)) +
geom_point(alpha=0.5) +
scale_colour_viridis_d(option = "viridis", begin=0.1)
## Warning: Removed 110 rows containing missing values or values outside the scale range
## (`geom_point()`).
ggplot(species_div.df,
aes(y=HillDiv_q1, x=Day, colour=as.factor(CommRich))) +
facet_grid(protegens~as.factor(Heat)) +
geom_point(alpha=0.5) +
scale_colour_viridis_d(option = "viridis")
## Warning: Removed 56 rows containing missing values or values outside the scale range
## (`geom_point()`).
ggplot(species_div.df,
aes(y=HillDiv_q2, x=Day, colour=as.factor(CommRich))) +
facet_grid(protegens~as.factor(Heat)) +
geom_point(alpha=0.5) +
scale_colour_viridis_d(option = "viridis")
## Warning: Removed 56 rows containing missing values or values outside the scale range
## (`geom_point()`).
# this one gives infinite values. That's not useful.
################################
# Plot figure for main text: Figure 3b
################################
# The Hill Diversity with q=1 seems useful!! Let's include this plot in the final manuscript:
# re-order the levels of heat so that the control appears first
species_div.df$Heat <- factor(species_div.df$Heat,
levels = levels(species_div.df$Heat)[c(5,1:4)])
# change variable names for nicer plotting
levels(species_div.df$Heat)[2:5] <- paste(levels(species_div.df$Heat)[2:5], "hrs")
species_div.df$Pprot_facet <- ifelse(species_div.df$protegens == 0, "P. protegens absent", "P. protegens present")
# create a data.frame for plotting red rectangles in the background
bckgrd <- data.frame(Heat=levels(species_div.df$Heat),
HillDiv_q1 = c(0, rep(2.4, 4)),
Day = c(0, rep(0.8, 4))) # all heat treatments start at the same time
test <- rbind(bckgrd,
data.frame(Heat=levels(species_div.df$Heat),
HillDiv_q1 = c(0, rep(2.4, 4)),
Day = c(0, 1.1, 1.5, 1.9, 2.9))) # choose end points that look good even if not perfectly accurate
png(filename = "./figures/Fig3_B.png", width = 6.98, height = 4.52, units = "in", res = 300)
ggplot(species_div.df %>% filter(CommRich > 1),
# exclude monocultures bc richness is not informative for these: their richness will always be equal to 1 even when they go extinct.
aes(y=HillDiv_q1, x=Day)) +
facet_grid(Pprot_facet ~ as.factor(Heat),
scales = "free_x", # allow x-axis of facets to freely choose their own max values
space = "free_x") + # allow facet columns to differ in their sizes
# add red rectangles in the background to indicate heat treatment as in Fig 1.
geom_ribbon(data=test, aes(ymin=0.95, ymax=2.4), # add a bit of padding above & below the points to look nice
position = "identity", # not sure this is needed now that I've switched away from geom_area? But it works so I don't care
fill="#C43131", alpha=0.25) + # use same colour and alpha as in fill for Fig. 1
# use beeswarm to jitter the points properly (alpha must be set to 0 bc of red rectangles)
geom_quasirandom(aes(fill=as.factor(CommRich)), # fill as a function of CommRich *must* be inside of this function otherwise it leads to a lot of problems with the geom_ribbon layer
pch=21) + # use points with fill and border bc they look nicer here
scale_fill_viridis_d(option = "viridis", begin=0.85, end=0) +
scale_x_continuous(breaks=1:5, # tick marks & tick labels only at integers
limits=c(0.5,NA), # add a little extra padding on the left side of x-axis bc I think it looks better
expand = c(0, 0.2)) + # for some reason this prevents points in the 6h facet from getting squished up against the right border of the facet
scale_y_continuous(expand = c(0, 0)) + # this prevents ggplot from adding any extra padding in the y-axis; we already added the defauly +/- 0.05 padding manually when we specified geom_ribbon
labs(y = "Observed Richness", # 1st order Hill Diversity
fill = "Inoculated\nRichness") +
guides(fill = guide_legend(override.aes = list(size=3), # make the points bigger in the legend
title.hjust = 0.5)) + # justify the 2nd line of legend title & points so they sit nicely under the 1st line
theme(strip.background = element_rect(fill = "white", colour="black"),
strip.text = element_text(color = "black"),
legend.box.spacing = margin(0.5), # stretch plot rightwards & closer to its legend
legend.title = element_text(size=14)) # make legend title a little smaller
## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
## -Inf
## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
## -Inf
dev.off()
## png
## 2
# clean up
rm(species_div.df, bckgrd, test)
Notice that for the control condition in the absence of protegens, there is a trend of decreasing species richness over time (e.g., as the communities reach equilibrium).
( ## Analysis?? )
I have effect sizes on richness for resistance and recovery but I’m
not sure if this is going to make it into the manuscript… (see the
analyze_temp_serial_transfer_expt--28Oct24.html if you’re
interested).
Which communities were most likely to go extinct? How long did the heat duration have to be in order to drive those communities to extinction?
The simplest hypothesis is that heat duration alone explains whether a community happens to go extinct.
A slightly more complex hypothesis from the thermal performance curve data (Fig. 2) would be that any species that is resistant to heat should be less likely to go extinct, even long duration heat. Therefore we would expect that the presence/absence of the heat resistant species, P. putida, should explain whether a community goes extinct.
Maddy’s hypothesis in setting up this experiment was that a higher inoculated species richness would make a community more resistant to heat. So we are going to check whether the inoculated species richness has any effect.
Another hypothesis that emerges from looking at the time series data itself (e.g., the ordination data) is that protegens has a unique effect on all communities where it is present. So let’s check that model as well.
# keep just the data on the last day of each time series
extinct.df <- absDensity %>% filter(Recov_Day == 2)
extinct.df <- rbind(extinct.df,
absDensity %>% filter(Heat == 0, Day == 5))
# binary vector of survival or extinction
extinct.df <- extinct.df %>% mutate(survived = ifelse(Total_density > 0, 1, 0))
### note that sample "24-07-08 Epoch G1" has missing data on Day 5 even though we know from the OD data that it survived.
extinct.df$survived[extinct.df$uniqID == "24-07-08 Epoch G1"] <- 1
# make protegens into a factor
extinct.df$protegens <- factor(extinct.df$protegens)
# fit the models
ext_mod.heat <- glmmTMB(survived ~ Heat,
data = extinct.df,
family = binomial,
control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
#simulateResiduals(fittedModel = ext_mod.heat, plot = TRUE)
ext_mod.heat_plus_rich <- glmmTMB(survived ~ CommRich + Heat,
data = extinct.df,
family = binomial,
control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
#simulateResiduals(fittedModel = ext_mod.heat_plus_rich, plot = TRUE)
ext_mod.heat_plus_prot <- glmmTMB(survived ~ Heat + protegens,
data = extinct.df,
family = binomial,
control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
simulateResiduals(fittedModel = ext_mod.heat_plus_prot, plot = TRUE)
## Object of Class DHARMa with simulated residuals based on 250 simulations with refit = FALSE . See ?DHARMa::simulateResiduals for help.
##
## Scaled residual values: 0.9645332 0.959227 0.02965619 0.2301006 0.1030454 0.9776571 0.827906 0.3086776 0.4034651 0.7309843 0.8247881 0.9087348 0.7163956 0.5443066 0.7949462 0.4422744 0.08401695 0.5889396 0.5640054 0.7787241 ...
ext_mod.heat_plus_resist <- glmmTMB(survived ~ Heat + resistant,
data = extinct.df,
family = binomial,
control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
#simulateResiduals(fittedModel = ext_mod.heat_plus_resist, plot = TRUE)
ext_mod.heat_by_rich <- glmmTMB(survived ~ CommRich*Heat,
data = extinct.df,
family = binomial,
control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
#simulateResiduals(fittedModel = ext_mod.heat_by_rich, plot = TRUE)
ext_mod.heat_by_prot <- glmmTMB(survived ~ Heat*protegens,
data = extinct.df,
family = binomial,
control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
## Warning in finalizeTMB(TMBStruc, obj, fit, h, data.tmb.old): Model convergence
## problem; non-positive-definite Hessian matrix. See vignette('troubleshooting')
## Warning in finalizeTMB(TMBStruc, obj, fit, h, data.tmb.old): Model convergence
## problem; singular convergence (7). See vignette('troubleshooting'),
## help('diagnose')
#simulateResiduals(fittedModel = ext_mod.heat_by_prot, plot = TRUE)
ext_mod.heat_rich_prot <- glmmTMB(survived ~ CommRich + Heat + protegens,
data = extinct.df,
family = binomial,
control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
#simulateResiduals(fittedModel = ext_mod.heat_rich_prot, plot = TRUE)
ext_mod.rich_heatXprot <- glmmTMB(survived ~ CommRich + Heat*protegens,
data = extinct.df,
family = binomial,
control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
#simulateResiduals(fittedModel = ext_mod.rich_heatXprot, plot = TRUE)
#anova(ext_mod.heat, ext_mod.heat_plus_rich)
#anova(ext_mod.heat, ext_mod.heat_by_rich)
#anova(ext_mod.heat_plus_rich, ext_mod.heat_by_rich)
AIC(ext_mod.heat, ext_mod.heat_plus_rich, ext_mod.heat_plus_prot, ext_mod.heat_by_rich, ext_mod.heat_by_prot, ext_mod.heat_rich_prot, ext_mod.rich_heatXprot, ext_mod.heat_plus_resist) %>% arrange(AIC)
AICc(ext_mod.heat, ext_mod.heat_plus_rich, ext_mod.heat_plus_prot, ext_mod.heat_by_rich, ext_mod.heat_by_prot, ext_mod.heat_rich_prot, ext_mod.rich_heatXprot, ext_mod.heat_plus_resist) %>% arrange(AICc)
BIC(ext_mod.heat, ext_mod.heat_plus_rich, ext_mod.heat_plus_prot, ext_mod.heat_by_rich, ext_mod.heat_by_prot, ext_mod.heat_rich_prot, ext_mod.rich_heatXprot, ext_mod.heat_plus_resist) %>% arrange(BIC)
summary(ext_mod.heat_rich_prot)
## Family: binomial ( logit )
## Formula: survived ~ CommRich + Heat + protegens
## Data: extinct.df
##
## AIC BIC logLik deviance df.resid
## 62.1 77.2 -27.0 54.1 320
##
##
## Conditional model:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 5.688e+00 1.512e+00 3.763 0.000168 ***
## CommRich 5.080e-01 5.035e-01 1.009 0.312982
## Heat -1.445e-01 3.096e-02 -4.666 3.07e-06 ***
## protegens1 2.305e+01 1.592e+04 0.001 0.998844
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(ext_mod.heat_plus_prot)
## Family: binomial ( logit )
## Formula: survived ~ Heat + protegens
## Data: extinct.df
##
## AIC BIC logLik deviance df.resid
## 61.1 72.5 -27.6 55.1 321
##
##
## Conditional model:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 6.459e+00 1.340e+00 4.820 1.43e-06 ***
## Heat -1.425e-01 3.038e-02 -4.690 2.73e-06 ***
## protegens1 2.363e+01 1.849e+04 0.001 0.999
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova(ext_mod.heat, ext_mod.heat_plus_prot)
anova(ext_mod.heat_plus_prot, ext_mod.heat_rich_prot)
# and let's report the R-squared for this
efronRSquared(residual = residuals(ext_mod.heat_plus_prot, type="response"),
predicted = predict(ext_mod.heat_plus_prot, type="response"),
statistic = "EfronRSquared")
## EfronRSquared
## 0.501
This tells us that the most important predictors are: 1. the duration of the heat event and 2. the presence/absence of protegens in the inoculated community. This model explains about 50% of the variation in the data. We have little power to detect an effect of inoculated community richness on the extinction.
A final possibility is that the growth rates of the different communities can explain whether they go extinct. Let’s check if the average growth rate of the community at 30C can predict its extinction…
ext_mod.expect_mu <- glmmTMB(survived ~ Heat + community_expected_mu,
data = extinct.df,
family = binomial,
control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
#simulateResiduals(fittedModel = ext_mod.expect_mu, plot = TRUE)
ext_mod.exptmu_prot <- glmmTMB(survived ~ Heat + community_expected_mu + protegens,
data = extinct.df,
family = binomial,
control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
simulateResiduals(fittedModel = ext_mod.exptmu_prot, plot = TRUE)
## Object of Class DHARMa with simulated residuals based on 250 simulations with refit = FALSE . See ?DHARMa::simulateResiduals for help.
##
## Scaled residual values: 0.9628912 0.9384698 0.02965619 0.4072284 0.1428897 0.9776571 0.6471349 0.3086776 0.6884299 0.7309843 0.8014265 0.9087348 0.7163956 0.5443066 0.7949462 0.5218201 0.08401695 0.5889396 0.05015453 0.7787241 ...
ext_mod.averaged_mu <- glmmTMB(survived ~ Heat + community_averaged_mu,
data = extinct.df,
family = binomial,
control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
#simulateResiduals(fittedModel = ext_mod.averaged_mu, plot = TRUE)
ext_mod.avemu_prot <- glmmTMB(survived ~ Heat + community_averaged_mu + protegens,
data = extinct.df,
family = binomial,
control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
simulateResiduals(fittedModel = ext_mod.avemu_prot, plot = TRUE)
## Object of Class DHARMa with simulated residuals based on 250 simulations with refit = FALSE . See ?DHARMa::simulateResiduals for help.
##
## Scaled residual values: 0.4089731 0.6516057 0.5394315 0.4501793 0.9890527 0.5333557 0.2309646 0.3948581 0.6868779 0.02586429 0.5943815 0.08103522 0.8765983 0.7344452 0.2819697 0.6605734 0.1427851 0.5192184 0.291833 0.4861997 ...
ext_mod.exptmu_prot_resist <- glmmTMB(survived ~ Heat + community_expected_mu + protegens + resistant,
data = extinct.df,
family = binomial,
control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
## Warning in finalizeTMB(TMBStruc, obj, fit, h, data.tmb.old): Model convergence
## problem; non-positive-definite Hessian matrix. See vignette('troubleshooting')
## Warning in finalizeTMB(TMBStruc, obj, fit, h, data.tmb.old): Model convergence
## problem; singular convergence (7). See vignette('troubleshooting'),
## help('diagnose')
#simulateResiduals(fittedModel = ext_mod.exptmu_prot_resist, plot = TRUE)
summary(ext_mod.exptmu_prot)
## Family: binomial ( logit )
## Formula: survived ~ Heat + community_expected_mu + protegens
## Data: extinct.df
##
## AIC BIC logLik deviance df.resid
## 42.1 57.2 -17.0 34.1 320
##
##
## Conditional model:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -3.080e+00 3.035e+00 -1.015 0.31019
## Heat -2.152e-01 5.309e-02 -4.054 5.03e-05 ***
## community_expected_mu 1.463e+01 5.024e+00 2.912 0.00359 **
## protegens1 2.433e+01 2.154e+04 0.001 0.99910
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(ext_mod.avemu_prot)
## Family: binomial ( logit )
## Formula: survived ~ Heat + community_averaged_mu + protegens
## Data: extinct.df
##
## AIC BIC logLik deviance df.resid
## 35.8 51.0 -13.9 27.8 320
##
##
## Conditional model:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -3.2158 2.9907 -1.075 0.2823
## Heat -0.5403 0.2515 -2.149 0.0317 *
## community_averaged_mu 26.8226 12.9368 2.073 0.0381 *
## protegens1 27.7150 24864.3060 0.001 0.9991
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# this model does not converge bc of the experimental design (some predictor combinations are unitary)
summary(ext_mod.exptmu_prot_resist)
## Family: binomial ( logit )
## Formula:
## survived ~ Heat + community_expected_mu + protegens + resistant
## Data: extinct.df
##
## AIC BIC logLik deviance df.resid
## NA NA NA NA 319
##
##
## Conditional model:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 36.833 NaN NaN NaN
## Heat -1.711 NaN NaN NaN
## community_expected_mu 8.399 NaN NaN NaN
## protegens1 79.339 NaN NaN NaN
## resistant 37.915 NaN NaN NaN
## this should go in a single table instead of 3 diferent ones:
AIC(ext_mod.heat, ext_mod.heat_plus_prot, ext_mod.rich_heatXprot, ext_mod.heat_plus_resist, ext_mod.expect_mu, ext_mod.averaged_mu, ext_mod.exptmu_prot, ext_mod.avemu_prot, ext_mod.exptmu_prot_resist) %>% arrange(AIC)
AICc(ext_mod.heat, ext_mod.heat_plus_prot, ext_mod.rich_heatXprot, ext_mod.heat_plus_resist, ext_mod.expect_mu, ext_mod.averaged_mu, ext_mod.exptmu_prot, ext_mod.avemu_prot, ext_mod.exptmu_prot_resist) %>% arrange(AICc)
BIC(ext_mod.heat, ext_mod.heat_plus_prot, ext_mod.rich_heatXprot, ext_mod.heat_plus_resist, ext_mod.expect_mu, ext_mod.averaged_mu, ext_mod.exptmu_prot, ext_mod.avemu_prot, ext_mod.exptmu_prot_resist) %>% arrange(BIC)
# wow, I'm shocked that this growth rate model is actually better. Let's double check that...
anova(ext_mod.heat_plus_prot, ext_mod.exptmu_prot)
anova(ext_mod.heat_plus_prot, ext_mod.avemu_prot)
# get the 95% confidence intervals:
print("expected mu:")
## [1] "expected mu:"
confint(ext_mod.exptmu_prot)
## 2.5 % 97.5 % Estimate
## (Intercept) -9.028488 2.8685403 -3.0799739
## Heat -0.319264 -0.1111684 -0.2152162
## community_expected_mu 4.784301 24.4766683 14.6304845
## protegens1 -42192.404379 42241.0685341 24.3320775
print("realized average mu:")
## [1] "realized average mu:"
confint(ext_mod.avemu_prot)
## 2.5 % 97.5 % Estimate
## (Intercept) -9.077401 2.645851e+00 -3.2157748
## Heat -1.033214 -4.744673e-02 -0.5403302
## community_averaged_mu 1.466930 5.217824e+01 26.8225864
## protegens1 -48705.429291 4.876086e+04 27.7149920
# and let's report the R-squared
efronRSquared(residual = residuals(ext_mod.exptmu_prot, type="response"),
predicted = predict(ext_mod.exptmu_prot, type="response"),
statistic = "EfronRSquared")
## EfronRSquared
## 0.707
efronRSquared(residual = residuals(ext_mod.avemu_prot, type="response"),
predicted = predict(ext_mod.avemu_prot, type="response"),
statistic = "EfronRSquared")
## EfronRSquared
## 0.747
Ok, so neither the community_expected_mu nor the community_averaged_mu are as good predictors as just heat duration and presence/absence of protegens. But when we add the presence/absence of protegens to either the community_expected_mu or community_averaged_mu, we get very good models that are significantly better than the one reported above. e.g., The model with just heat duration and protegens presence/absence explains 50% of the variation, while models explain 70-75% of the variation. Another way to report this result is to give the \(\delta\)AIC or \(\delta\)BIC.
Although the community_averaged_mu is a better fit, I prefer to use the community_expected_mu because it fits better in the framework of “can we predict microbial community dynamics from the species traits?” Moreover, the fit (at least for the extinction data) is not that much worse for the community_expected_mu as compared to the community_averaged_mu – and both are insufficient to explain the data without the effect of protegens anyway!
absDensity <- absDensity %>% select(-community_averaged_mu)
absDen_forFit <- absDen_forFit %>% select(-community_averaged_mu)
extinct.df <- extinct.df %>% select(-community_averaged_mu)
Plot the preferred model against the data: growth on final day ~ heat duration + protegens present + community_expected_mu
# create data.frame for plotting
extinct.df <- cbind(extinct.df,
predicted = predict(ext_mod.exptmu_prot, type="response"))
# plot the predictions against the data
plot(ggplot(extinct.df,
aes(x=as.factor(Heat),
y=survived,
colour=community_expected_mu,
group=as.factor(community_expected_mu))) +
facet_wrap(. ~ protegens,
labeller = as_labeller(c(`0`="P. protegens absent",
`1`="P. protegens present"))) +
geom_hline(yintercept = 0, colour="grey") +
geom_line(aes(y = predicted)) +
geom_jitter(alpha=0.6, width=0.1, height = 0.25) +
# would be nice to use beeswarm package but not sure how to do that as y is factorial here but numeric for the model predictions
scale_y_continuous(breaks = c(0, 1)) +
scale_colour_viridis_c(option = "inferno", end=0.85) +
labs(x="Heat duration (hrs)",
y="Growth in well on last day?", colour="Community\nExpected\nGrowth Rate"))
# plot the effect sizes of the preferred model
extinct_forplot <- data.frame(confint(ext_mod.exptmu_prot))
colnames(extinct_forplot)[1:2] <- c("loCI", "hiCI")
extinct_forplot$predictor <- as.factor(rownames(extinct_forplot))
# protegens effect size is not significant and has giant CI that obscure other estimates
ggplot(extinct_forplot,
aes(x = Estimate, y = predictor)) +
geom_vline(xintercept = 0, colour="grey") +
geom_point() +
geom_errorbarh(aes(xmin = loCI, xmax = hiCI), height=0)
# plot the effect sizes again without protegens
ggplot(extinct_forplot %>% filter(predictor != "protegens1"),
aes(x = Estimate, y = predictor)) +
geom_vline(xintercept = 0, colour="grey") +
geom_point() +
geom_errorbarh(aes(xmin = loCI, xmax = hiCI), height=0)
# clean up
rm(ext_mod.heat, ext_mod.heat_plus_rich, ext_mod.heat_plus_resist, ext_mod.heat_plus_prot, ext_mod.heat_by_rich, ext_mod.heat_by_prot, ext_mod.heat_rich_prot, ext_mod.rich_heatXprot, ext_mod.expect_mu, ext_mod.averaged_mu, ext_mod.exptmu_prot, ext_mod.avemu_prot, extinct_forplot, ext_mod.exptmu_prot_resist)
How is community diversity impacted during and after heat? Here we will have to be mindful to control for inoculated community richness as a nuisance variable (i.e., because we will always expect to see (less) more diversity in communities that were inoculated with more (less) species. But this is just part of our experimental design; we’re not interested in this effect per se).
Let’s first plot the Shannon diversity directly to get an idea of what we’re dealing with:
ggplot(absDen_forFit %>% filter(CommRich > 1), # monocultures are meaningless for diversity
aes(y=Diversity, x=Day, fill=as.factor(CommRich))) +
facet_grid(protegens~as.factor(Heat)) +
geom_quasirandom(alpha=0.7, pch=21) +
scale_fill_viridis_d(option = "viridis", begin=0.85, end=0) +
labs(y = "Shannon Diversity",
fill = "Inoculated\nRichness")
## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`position_quasirandom()`).
ggplot(absDen_forFit %>% filter(CommRich > 1), # monocultures are meaningless for diversity
aes(y=Diversity, x=Day, fill=community_expected_mu)) +
facet_grid(protegens~as.factor(Heat)) +
geom_quasirandom(alpha=0.7, pch=21) +
scale_fill_viridis_c(option = "inferno", end=0.85) +
labs(y = "Shannon Diversity",
fill = "Expected\nCommunity mu")
## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`position_quasirandom()`).
Maddy and Gerard suggest that I use the full model to estimate effect
size then emmeans to estimate the effect size post-hoc. I’m tailouring
this analysis on the example script that Nico sent me
(Script_simplified for Hermina.R).
One unique attribute of my experimental design is that the Day used as control differs with heat duration (e.g., last day of recovery for 6h heat duration is Day 3 but last day of recovery for 48h is Day 5). Our solution for this is to run separate models for each heat treatment with its respective controls (i.e., 4 models in total). To make sure that the effect sizes will be directly comparable across the models (especially with respect to the standard deviation), Gerard suggested that I scale the whole data prior to splitting it up into 4 (but not centering it as that will give me negative values that I can’t really use a ). Finally, if/when testing for significance it will then be necessary to control for multiple testing (e.g., using a Bonferroni correction).
Note that for diversity I am considering CommRich as a numeric (which
assumes a linear effect of community richness, e.g., where 4 species is
2 * the effect of 2 species). Initially I tried playing around with
CommRich as an ordered (& unordered) factor. But I found that
glmmTMB was choosing to drop different predictors because
it was upset that my model was overparameterized. This was particularly
annoying as it was dropping the estimates for the control
treatments…
The first thing we need to do is to pick which GLM family of distributions looks best for our data:
# remove the monocultures from the data
diversity_forFit <- absDen_forFit %>% filter(CommRich > 1) %>% # diversity is nonsense for monocultures
select(-Total_density, -Conc_putida, -Conc_protegens, -Conc_grimontii, -Conc_veronii)
# scale the data by its standard deviation
diversity_forFit$Diversity_scale <- scale(diversity_forFit$Diversity,
scale = sd(diversity_forFit$Diversity, na.rm = TRUE),
center = FALSE)
# the max re-scaled value is 5.38 and 38% of the data is 0's
# so try gamma and lognormal distributions (maybe also Gaussian just to check that it's a bad fit?)
summary(diversity_forFit$Diversity_scale)
## V1
## Min. :0.00000
## 1st Qu.:0.00000
## Median :0.01955
## Mean :0.45128
## 3rd Qu.:0.19557
## Max. :5.37608
## NA's :12
sum(diversity_forFit$Diversity_scale == 0, na.rm = TRUE) / length(diversity_forFit$Diversity_scale)
## [1] 0.3850868
# let's compare different GLM families
try_gaussian <- glmmTMB(Diversity_scale ~ CommRich:Day + Heat:Day + CommRich:Heat:Day,
data = diversity_forFit,
control = glmmTMBControl(optCtrl = list(iter.max = 10000,eval.max = 10000)))
simulateResiduals(fittedModel = try_gaussian, plot = TRUE)
## Object of Class DHARMa with simulated residuals based on 250 simulations with refit = FALSE . See ?DHARMa::simulateResiduals for help.
##
## Scaled residual values: 1 0.196 0.26 0.992 1 0.2 0.208 0.944 0.208 0.248 1 0.18 0.26 0.224 1 0.212 0.212 0.244 1 0.256 ...
print("gamma family with zero-inflated model:")
## [1] "gamma family with zero-inflated model:"
try_gamma0 <- glmmTMB(Diversity_scale ~ CommRich:Day + Heat:Day + CommRich:Heat:Day,
data = diversity_forFit,
family = ziGamma,
ziformula = ~1, # this needs to be added because there are 0 values in the data
control = glmmTMBControl(optCtrl = list(iter.max = 10000,eval.max = 10000)))
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
simulateResiduals(fittedModel = try_gamma0, plot = TRUE)
## Object of Class DHARMa with simulated residuals based on 250 simulations with refit = FALSE . See ?DHARMa::simulateResiduals for help.
##
## Scaled residual values: 0.952 0.42 0.3568358 0.944 0.94 0.3401457 0.3068186 0.88 0.58 0.1703229 0.988 0.456 0.612 0.512 0.964 0.1434111 0.532 0.428 0.984 0.424 ...
print("lognormal family with zero-inflated model:")
## [1] "lognormal family with zero-inflated model:"
try_lognorm0 <- glmmTMB(Diversity_scale ~ CommRich:Day + Heat:Day + CommRich:Heat:Day,
data = diversity_forFit,
family = lognormal,
ziformula = ~1, # this needs to be added because there are 0 values in the data
control = glmmTMBControl(optCtrl = list(iter.max = 10000,eval.max = 10000)))
simulateResiduals(fittedModel = try_lognorm0, plot = TRUE)
## Object of Class DHARMa with simulated residuals based on 250 simulations with refit = FALSE . See ?DHARMa::simulateResiduals for help.
##
## Scaled residual values: 0.944 0.392 0.160635 0.952 0.948 0.1410465 0.2099037 0.948 0.488 0.3268879 0.968 0.476 0.628 0.5 0.972 0.09914734 0.556 0.4 0.988 0.452 ...
print("log(x+1) transformed data, lognormal family with zero-inflated model:")
## [1] "log(x+1) transformed data, lognormal family with zero-inflated model:"
try_LOGlognorm0 <- glmmTMB(log(Diversity_scale+1) ~ CommRich:Day + Heat:Day + CommRich:Heat:Day,
data = diversity_forFit,
family = lognormal,
ziformula = ~1, # I'm keeping this as 0-inflated lognormal alone was already over-dispersed. So I want to see if the log(x+1) transformation sufficiently brings in the long tail.
control = glmmTMBControl(optCtrl = list(iter.max = 10000,eval.max = 10000)))
simulateResiduals(fittedModel = try_LOGlognorm0, plot = TRUE)
## Object of Class DHARMa with simulated residuals based on 250 simulations with refit = FALSE . See ?DHARMa::simulateResiduals for help.
##
## Scaled residual values: 0.932 0.38 0.160635 0.94 0.92 0.1410465 0.2099037 0.94 0.488 0.3268879 0.944 0.476 0.644 0.508 0.956 0.09914734 0.568 0.4 0.976 0.448 ...
try_negbinom <- glmmTMB(as.integer(Diversity_scale*1000) ~ CommRich:Day + Heat:Day + CommRich:Heat:Day,
data = diversity_forFit,
family = nbinom2,
control = glmmTMBControl(optCtrl = list(iter.max = 10000,eval.max = 10000)))
simulateResiduals(fittedModel = try_negbinom, plot = TRUE)
## Object of Class DHARMa with simulated residuals based on 250 simulations with refit = FALSE . See ?DHARMa::simulateResiduals for help.
##
## Scaled residual values: 0.9 0.484 0.02934308 0.888 0.932 0.001846165 0.1464347 0.844 0.524 0.04893932 0.928 0.545532 0.592 0.498831 0.964 0.08390474 0.62 0.436 0.972 0.496 ...
try_negbinom0 <- glmmTMB(as.integer(Diversity_scale*1000) ~ CommRich:Day + Heat:Day + CommRich:Heat:Day,
data = diversity_forFit,
family = nbinom2,
ziformula = ~1, # try zero inflated distribution
control = glmmTMBControl(optCtrl = list(iter.max = 10000,eval.max = 10000)))
simulateResiduals(fittedModel = try_negbinom0, plot = TRUE)
## Object of Class DHARMa with simulated residuals based on 250 simulations with refit = FALSE . See ?DHARMa::simulateResiduals for help.
##
## Scaled residual values: 0.968 0.4760399 0.09975181 0.944 0.96 0.1870824 0.1012531 0.856 0.524 0.2883702 0.972 0.488 0.5489878 0.472 0.972 0.2933996 0.576 0.4214444 0.968 0.48 ...
try_poisson <- glmmTMB(as.integer(Diversity_scale*1000) ~ CommRich:Day + Heat:Day + CommRich:Heat:Day,
data = diversity_forFit,
family = genpois,
control = glmmTMBControl(optCtrl = list(iter.max = 10000,eval.max = 10000)))
simulateResiduals(fittedModel = try_poisson, plot = TRUE)
## DHARMa:testOutliers with type = binomial may have inflated Type I error rates for integer-valued distributions. To get a more exact result, it is recommended to re-run testOutliers with type = 'bootstrap'. See ?testOutliers for details
## Object of Class DHARMa with simulated residuals based on 250 simulations with refit = FALSE . See ?DHARMa::simulateResiduals for help.
##
## Scaled residual values: 0.992 0.4616676 0.04975974 0.968 0.98 0.001601337 0.06714782 0.96 0.5715112 0.02849031 0.98 0.5892411 0.78 0.728 0.984 0.0152989 0.78 0.332584 0.984 0.4979502 ...
try_poisson0 <- glmmTMB(as.integer(Diversity_scale*1000) ~ CommRich:Day + Heat:Day + CommRich:Heat:Day,
data = diversity_forFit,
family = genpois,
ziformula = ~1, # try zero inflated distribution
control = glmmTMBControl(optCtrl = list(iter.max = 10000,eval.max = 10000)))
simulateResiduals(fittedModel = try_poisson0, plot = TRUE)
## Object of Class DHARMa with simulated residuals based on 250 simulations with refit = FALSE . See ?DHARMa::simulateResiduals for help.
##
## Scaled residual values: 0.956 0.3819737 0.211971 0.968 0.964 0.01318631 0.2743196 0.924 0.3770803 0.244544 0.972 0.388 0.584 0.4491087 0.96 0.05561836 0.5923928 0.3206311 0.972 0.356 ...
################################################
# I would generally prefer to use a zero inflated distribution but those are annoying to calculate effect sizes for (by posthoc emmeans). So let's do the classic hack and
# find the smallest non-zero value in the rescaled diversity estimate
smallest_diversity <- min(diversity_forFit$Diversity_scale[diversity_forFit$Diversity_scale != 0], na.rm=TRUE)
# now add 1/100th of that value to all the diversity estimates and re-do the fit for the family that looked best above
diversity_forFit$Diversity_scalePLUSepsilon <- diversity_forFit$Diversity_scale + smallest_diversity/100
print("data transformed (D + epsilon), lognormal family:")
## [1] "data transformed (D + epsilon), lognormal family:"
# this looks just awful:
try_lognorm_plusE <- glmmTMB(Diversity_scalePLUSepsilon ~ CommRich:Day + Heat:Day + CommRich:Heat:Day,
data = diversity_forFit,
family = lognormal,
control = glmmTMBControl(optCtrl = list(iter.max = 10000,eval.max = 10000)))
simulateResiduals(fittedModel = try_lognorm_plusE, plot = TRUE)
## Object of Class DHARMa with simulated residuals based on 250 simulations with refit = FALSE . See ?DHARMa::simulateResiduals for help.
##
## Scaled residual values: 0.912 0.444 0.072 0.932 0.908 0.04 0.044 0.868 0.48 0.028 0.904 0.472 0.608 0.576 0.9 0.056 0.576 0.34 0.88 0.444 ...
#Let's see if we can make things better in any way by using a log transformation? (this model looked the best with zero-inflation above)
try_lognorm_logPLUSe <- glmmTMB(log(Diversity_scale+1+smallest_diversity) ~ CommRich:Day + Heat:Day + CommRich:Heat:Day,
data = diversity_forFit,
family = lognormal,
control = glmmTMBControl(optCtrl = list(iter.max = 10000,eval.max = 10000)))
simulateResiduals(fittedModel = try_lognorm_logPLUSe, plot = TRUE)
## Object of Class DHARMa with simulated residuals based on 250 simulations with refit = FALSE . See ?DHARMa::simulateResiduals for help.
##
## Scaled residual values: 0.96 0.196 0.112 0.972 0.944 0.072 0.088 0.936 0.288 0.052 0.968 0.32 0.544 0.436 0.956 0.084 0.524 0.204 0.956 0.256 ...
# let's just try other families...
try_gaussian_plusE <- glmmTMB(Diversity_scalePLUSepsilon ~ CommRich:Day + Heat:Day + CommRich:Heat:Day,
data = diversity_forFit,
control = glmmTMBControl(optCtrl = list(iter.max = 10000,eval.max = 10000)))
simulateResiduals(fittedModel = try_gaussian_plusE, plot = TRUE)
## Object of Class DHARMa with simulated residuals based on 250 simulations with refit = FALSE . See ?DHARMa::simulateResiduals for help.
##
## Scaled residual values: 1 0.196 0.26 0.992 1 0.2 0.208 0.944 0.208 0.248 1 0.18 0.26 0.224 1 0.212 0.212 0.244 1 0.256 ...
print("data transformed (D + epsilon), gamma family:")
## [1] "data transformed (D + epsilon), gamma family:"
try_gamma_plusE <- glmmTMB(Diversity_scalePLUSepsilon ~ CommRich:Day + Heat:Day + CommRich:Heat:Day,
data = diversity_forFit,
family = Gamma,
control = glmmTMBControl(optCtrl = list(iter.max = 10000,eval.max = 10000)))
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
simulateResiduals(fittedModel = try_gamma_plusE, plot = TRUE)
## Object of Class DHARMa with simulated residuals based on 250 simulations with refit = FALSE . See ?DHARMa::simulateResiduals for help.
##
## Scaled residual values: 0.932 0.376 0.156 0.88 0.916 0.104 0.152 0.864 0.42 0.108 0.928 0.38 0.516 0.48 0.884 0.104 0.512 0.364 0.944 0.408 ...
try_negbinom_plusE <- glmmTMB(as.integer(Diversity_scalePLUSepsilon*1000) ~ CommRich:Day + Heat:Day + CommRich:Heat:Day,
data = diversity_forFit,
family = nbinom2,
control = glmmTMBControl(optCtrl = list(iter.max = 10000,eval.max = 10000)))
simulateResiduals(fittedModel = try_negbinom_plusE, plot = TRUE)
## Object of Class DHARMa with simulated residuals based on 250 simulations with refit = FALSE . See ?DHARMa::simulateResiduals for help.
##
## Scaled residual values: 0.888 0.4291341 0.02065348 0.936 0.952 0.05675836 0.05408069 0.876 0.512 0.04733732 0.98 0.464 0.592 0.616 0.956 0.00626431 0.596 0.443546 0.976 0.4538491 ...
try_poisson_plusE <- glmmTMB(as.integer(Diversity_scalePLUSepsilon*1000) ~ CommRich:Day + Heat:Day + CommRich:Heat:Day,
data = diversity_forFit,
family = genpois,
control = glmmTMBControl(optCtrl = list(iter.max = 10000,eval.max = 10000)))
simulateResiduals(fittedModel = try_poisson_plusE, plot = TRUE)
## DHARMa:testOutliers with type = binomial may have inflated Type I error rates for integer-valued distributions. To get a more exact result, it is recommended to re-run testOutliers with type = 'bootstrap'. See ?testOutliers for details
## qu = 0.75, log(sigma) = -2.451302 : outer Newton did not converge fully.
## Object of Class DHARMa with simulated residuals based on 250 simulations with refit = FALSE . See ?DHARMa::simulateResiduals for help.
##
## Scaled residual values: 0.992 0.4616676 0.04975974 0.968 0.98 0.001601337 0.06714782 0.96 0.5715112 0.02849031 0.98 0.5892411 0.78 0.728 0.984 0.0152989 0.78 0.331438 0.984 0.496 ...
# let's check this with AIC and BIC
AIC(try_gaussian, try_gamma0, try_lognorm0, try_LOGlognorm0,
try_negbinom, try_negbinom0, try_poisson, try_poisson0, try_lognorm_plusE, try_lognorm_logPLUSe, try_gaussian_plusE, try_gamma_plusE, try_negbinom_plusE, try_poisson_plusE) %>% arrange(AIC)
BIC(try_gaussian, try_gamma0, try_lognorm0, try_LOGlognorm0,
try_negbinom, try_negbinom0, try_poisson, try_poisson0, try_lognorm_plusE, try_lognorm_logPLUSe, try_gaussian_plusE, try_gamma_plusE, try_negbinom_plusE, try_poisson_plusE) %>% arrange(BIC)
# clean up
rm(try_gaussian, try_gamma0, try_lognorm0, try_LOGlognorm0, try_negbinom, try_negbinom0, try_poisson, try_poisson0, try_lognorm_plusE, try_lognorm_logPLUSe, try_gaussian_plusE, try_gamma_plusE, try_negbinom_plusE, try_poisson_plusE, smallest_diversity)
According to the residuals, the zero-inflated negative binomial and the zero-inflated lognormal are about equally okay-ish. (We could also take the AIC & BIC values in consideration for our decision but that is far less important.) My preference would be to go with the zero-inflated lognormal. The reason for this is because my understanding is that the most important thing to consider when selecting a GLM family is which family would a priori be the most natural choice. For diversity data, the Gamma or lognormal distributions are the most natural choices a priori because, for 4 species, the Shannon diversity data is a continuous variable between 0 and 1.386294. Therefore I think it makes sense to choose the lognormal (even if its residuals are not perfect).
For this data I would generally prefer to use one of the zero inflated distributions but we are interested in the effect sizes (emmeans & posthoc analyses). The problem with the zero inflated distribution is that it leads to very confusing looking effect sizes when I do the downstream analyses. This is because the effect sizes are split up over the conditional and the zero-inflated parts of the model. The overall effect size is: (1 - zi)*(cond mean). See this stackover flow thread. The practical problem is that it just looks really weird for this data.
To get around this issue, I have transformed the re-scaled data to
add a small value (min(rescaled diversity)/100) to all
diversity estimates such that we remove the zero’s. This way we will no
longer need to use a zero-inflated part in the model and the effect
sizes will be simpler to explain. (Especially because I think I am
losing a lot of power for the 48h treatment as a result of almost
everything literally being 0.)
My final decision is to use the \(D + \epsilon\) transformed data and a lognormal family of distributions. It doesn’t have the best residuals but it strikes the right compromise of a priori justifiable as well as useful and interpretable for the analysis we’re interested in.
Finally, note that in the model fitting above I consider Day as a numeric predictor. This is because I want to decide on the GLM family by considering the complete data. (& I was having problems with Day as an un/ordered factor…)
For the rest of the analysis below, I consider the effect of day
(which is called Trtmt_Day) as a factor representing either
resistance (i.e., on the last day of heat) or recovery. To do this I
need to subset the data to include only resistance, 1st day post-heat,
and last day post-heat for the heat treatment. And I need to keep
exactly the same days of the control treatment. I create this data
subset for each heat treatment duration (so 4 data subsets in
total).
We want to find which model is the best fit for all the data subsets.
Note that it’s not possible to fit any models with protegens present and heat resistant both as predictors!!
# a hacky function to get the number of parameters in the model
npar_of_glmmTMB_fit <- function(modfit)
length(modfit$fit$parfull)
# a function to fit the different models to the subsetted data:
fit_diversity_models <- function(data_subset) {
# create list for output
output.ls <- list()
# this is the simplest model. I'm fitting it to check for colinearity
output.ls[["simple"]] <- glmmTMB(Diversity_scalePLUSepsilon ~ CommRich + Heat + Trtmt_Day + protegens + community_expected_mu,
data = data_subset,
family = lognormal,
control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
# this is another simple model to check for colinearity
output.ls[["simple resist"]] <- glmmTMB(Diversity_scalePLUSepsilon ~ CommRich + Heat + Trtmt_Day + community_expected_mu + resistant,
data = data_subset,
family = lognormal,
control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
# this is our null model:
output.ls[["H0"]] <- glmmTMB(Diversity_scalePLUSepsilon ~ CommRich + Heat*Trtmt_Day,
data = data_subset,
family = lognormal,
control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
# this is another null model just to confirm that CommRich has NO interactions with heat
output.ls[["H0: *CommRich"]] <- glmmTMB(Diversity_scalePLUSepsilon ~ CommRich*Heat*Trtmt_Day,
data = data_subset,
family = lognormal,
control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
output.ls[["+resist"]] <- glmmTMB(Diversity_scalePLUSepsilon ~ CommRich + Heat*Trtmt_Day + resistant,
data = data_subset,
family = lognormal,
control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
output.ls[["*resist"]] <- glmmTMB(Diversity_scalePLUSepsilon ~ CommRich + Heat*Trtmt_Day*resistant,
data = data_subset,
family = lognormal,
control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
output.ls[["+prot"]] <- glmmTMB(Diversity_scalePLUSepsilon ~ CommRich + Heat*Trtmt_Day + protegens,
data = data_subset,
family = lognormal,
control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
output.ls[["*prot"]] <- glmmTMB(Diversity_scalePLUSepsilon ~ CommRich + Heat*Trtmt_Day*protegens,
data = data_subset,
family = lognormal,
control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
output.ls[["+mu"]] <- glmmTMB(Diversity_scalePLUSepsilon ~ CommRich + Heat*Trtmt_Day + community_expected_mu,
data = data_subset,
family = lognormal,
control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
output.ls[["*mu"]] <- glmmTMB(Diversity_scalePLUSepsilon ~ CommRich + Heat*Trtmt_Day*community_expected_mu,
data = data_subset,
family = lognormal,
control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
output.ls[["+prot +mu"]] <- glmmTMB(Diversity_scalePLUSepsilon ~ CommRich + Heat*Trtmt_Day + protegens + community_expected_mu,
data = data_subset,
family = lognormal,
control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
output.ls[["+resist +mu"]] <- glmmTMB(Diversity_scalePLUSepsilon ~ CommRich + Heat*Trtmt_Day + resistant + community_expected_mu,
data = data_subset,
family = lognormal,
control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
output.ls[["*prot +mu"]] <- glmmTMB(Diversity_scalePLUSepsilon ~ CommRich + Heat*Trtmt_Day*protegens + community_expected_mu,
data = data_subset,
family = lognormal,
control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
output.ls[["*prot + prot*mu"]] <- glmmTMB(Diversity_scalePLUSepsilon ~ CommRich + Heat*Trtmt_Day*protegens + protegens*community_expected_mu,
data = data_subset,
family = lognormal,
control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
output.ls[["*mu +prot"]] <- glmmTMB(Diversity_scalePLUSepsilon ~ CommRich + Heat*Trtmt_Day*community_expected_mu + protegens,
data = data_subset,
family = lognormal,
control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
output.ls[["*mu + mu*prot"]] <- glmmTMB(Diversity_scalePLUSepsilon ~ CommRich + Heat*Trtmt_Day*community_expected_mu + community_expected_mu*protegens,
data = data_subset,
family = lognormal,
control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
output.ls[["*mu +resist"]] <- glmmTMB(Diversity_scalePLUSepsilon ~ CommRich + Heat*Trtmt_Day*community_expected_mu + resistant,
data = data_subset,
family = lognormal,
control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
output.ls[["*mu + mu*resist"]] <- glmmTMB(Diversity_scalePLUSepsilon ~ CommRich + Heat*Trtmt_Day*community_expected_mu + community_expected_mu*resistant,
data = data_subset,
family = lognormal,
control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
output.ls[["*resist +mu"]] <- glmmTMB(Diversity_scalePLUSepsilon ~ CommRich + Heat*Trtmt_Day*resistant + community_expected_mu,
data = data_subset,
family = lognormal,
control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
output.ls[["*resist + resist*mu"]] <- glmmTMB(Diversity_scalePLUSepsilon ~ CommRich + Heat*Trtmt_Day*resistant + resistant*community_expected_mu,
data = data_subset,
family = lognormal,
control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
output.ls[["*prot*mu"]] <- glmmTMB(Diversity_scalePLUSepsilon ~ CommRich + Heat*Trtmt_Day*protegens*community_expected_mu,
data = data_subset,
family = lognormal,
control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
output.ls[["*mu*resist"]] <- glmmTMB(Diversity_scalePLUSepsilon ~ CommRich + Heat*Trtmt_Day*community_expected_mu*resistant,
data = data_subset,
family = lognormal,
control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
return(output.ls)
}
# a function to plot model predictions against the data
# INOCULATED COMMUNITY RICHNESS is plotted as different colours
plot_model_pred.CommRich <- function(mod_list, mod_name){
# create data.frame for plotting
divers_predict <- cbind(mod_list[[mod_name]]$frame,
predicted=predict(mod_list[[mod_name]], type="response"))
# change the first column name for easier plotting
colnames(divers_predict)[1] <- "observed"
# create the plot
out <- ggplot(divers_predict,
aes(x=Trtmt_Day, y=observed, colour=as.factor(CommRich))) +
facet_grid(protegens~Heat) +
geom_jitter(alpha=0.4) +
geom_line(aes(y=predicted, group=as.factor(CommRich))) +
scale_y_log10() +
scale_colour_viridis_d(option = "viridis", begin=0.1, end=0.85) +
labs(y="Shannon diversity (rescaled)",
colour="CommRich",
title=paste(mod_name, "model predictions"))
return(out)
rm(divers_predict)
}
# a function to plot model predictions against the data
# EXPECTED COMMUNITY MU is plotted as different colours
plot_model_pred.MU <- function(mod_list, mod_name){
# create data.frame for plotting
divers_predict <- cbind(mod_list[[mod_name]]$frame,
predicted=predict(mod_list[[mod_name]], type="response"))
# change the first column name for easier plotting
colnames(divers_predict)[1] <- "observed"
# create the plot
out <- ggplot(divers_predict,
aes(x=Trtmt_Day, y=observed,
colour=community_expected_mu, group=as.factor(community_expected_mu))) +
facet_grid(protegens~Heat) +
geom_jitter(alpha=0.4) +
geom_line(aes(y=predicted)) +
scale_y_log10() +
scale_colour_viridis_c(option = "inferno", end=0.85) +
labs(y="Shannon diversity (rescaled)",
colour="Expected\nCommunity mu",
title=paste(mod_name, "model predictions"))
return(out)
rm(divers_predict)
}
####################
# 6h heat duration
####################
# grab just the treatment with its associated control data
diversity_6h <- rbind(diversity_forFit %>% filter(Heat == "6"),
diversity_forFit %>% filter(Heat == "control", Day < 4))
# create a column for last day of heat, first day of recovery, and last day of recovery
diversity_6h$Trtmt_Day <- "resist"
diversity_6h$Trtmt_Day[diversity_6h$Day == 2] <- "recov_1"
diversity_6h$Trtmt_Day[diversity_6h$Day == 3] <- "recov_2"
# fit different models:
div_mods6h <- fit_diversity_models(diversity_6h)
# check the simplest possible models for multicolinearity
check_collinearity(div_mods6h[["simple"]])
check_collinearity(div_mods6h[["simple resist"]])
# check the fit of the overall preferred model
simulateResiduals(fittedModel = div_mods6h[["*prot + prot*mu"]], plot = TRUE)
## Object of Class DHARMa with simulated residuals based on 250 simulations with refit = FALSE . See ?DHARMa::simulateResiduals for help.
##
## Scaled residual values: 0.06 0.72 0.456 0.956 0.088 0.156 0.768 0.852 0.352 0.892 0.812 0.64 0.248 0.804 0.696 0.728 0.216 0.872 0.284 0.916 ...
summary(div_mods6h[["*prot + prot*mu"]])
## Family: lognormal ( log )
## Formula:
## Diversity_scalePLUSepsilon ~ CommRich + Heat * Trtmt_Day * protegens +
## protegens * community_expected_mu
## Data: data_subset
##
## AIC BIC logLik deviance df.resid
## -1202 -1143 617 -1234 280
##
##
## Dispersion parameter for lognormal family (): 1.21
##
## Conditional model:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.12238 0.65007 -1.727 0.08425 .
## CommRich 0.65806 0.07012 9.384 < 2e-16 ***
## Heatcontrol 1.55744 0.17430 8.935 < 2e-16 ***
## Trtmt_Dayrecov_2 -0.22741 0.20855 -1.090 0.27551
## Trtmt_Dayresist 0.05919 0.19731 0.300 0.76419
## protegens -8.00074 1.08149 -7.398 1.38e-13 ***
## community_expected_mu -1.03511 0.66866 -1.548 0.12161
## Heatcontrol:Trtmt_Dayrecov_2 -0.28588 0.25017 -1.143 0.25314
## Heatcontrol:Trtmt_Dayresist 0.17917 0.22391 0.800 0.42360
## Heatcontrol:protegens -1.00383 0.34733 -2.890 0.00385 **
## Trtmt_Dayrecov_2:protegens 0.15475 0.36833 0.420 0.67438
## Trtmt_Dayresist:protegens 0.10914 0.35737 0.305 0.76006
## protegens:community_expected_mu 5.56391 1.17388 4.740 2.14e-06 ***
## Heatcontrol:Trtmt_Dayrecov_2:protegens 0.23599 0.49838 0.474 0.63584
## Heatcontrol:Trtmt_Dayresist:protegens -0.07657 0.47374 -0.162 0.87160
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# print a summary table of the model fits
data.frame(pars = sapply(div_mods6h, npar_of_glmmTMB_fit),
AIC = sapply(div_mods6h, AIC),
AICc = sapply(div_mods6h, AICc),
BIC = sapply(div_mods6h, BIC)) %>%
mutate(dAIC = min(AIC)-AIC,
dAICc = min(AICc)-AICc,
dBIC = min(BIC)-BIC) %>% arrange(BIC)
# print some of the top model predictions
print(plot_model_pred.CommRich(mod_list=div_mods6h, mod_name="+prot"))
print(plot_model_pred.MU(mod_list=div_mods6h, mod_name="*prot + prot*mu"))
print(plot_model_pred.MU(mod_list=div_mods6h, mod_name="*prot*mu"))
####################
# 12h heat duration
####################
# grab just the treatment with its associated control data
diversity_12h <- rbind(diversity_forFit %>% filter(Heat == "12", Day > 1),
diversity_forFit %>% filter(Heat == "control", Day > 1, Day != 5))
# create a column for last day of heat, first day of recovery, and last day of recovery
diversity_12h$Trtmt_Day <- "resist"
diversity_12h$Trtmt_Day[diversity_12h$Day == 3] <- "recov_1"
diversity_12h$Trtmt_Day[diversity_12h$Day == 4] <- "recov_2"
# fit different models:
div_mods12h <- fit_diversity_models(diversity_12h)
# check the simplest possible models for multicolinearity
check_collinearity(div_mods12h[["simple"]])
check_collinearity(div_mods12h[["simple resist"]])
# check the fit of the overall preferred model
simulateResiduals(fittedModel = div_mods12h[["*prot + prot*mu"]], plot = TRUE)
## qu = 0.75, log(sigma) = -2.030192 : outer Newton did not converge fully.
## Object of Class DHARMa with simulated residuals based on 250 simulations with refit = FALSE . See ?DHARMa::simulateResiduals for help.
##
## Scaled residual values: 0.696 0.028 0.964 0.224 0.628 0.356 0.42 0.992 0.072 0.104 0.252 0.46 0.22 0.02 0.744 0.364 0.82 0.828 0.376 0.208 ...
summary(div_mods12h[["*prot + prot*mu"]])
## Family: lognormal ( log )
## Formula:
## Diversity_scalePLUSepsilon ~ CommRich + Heat * Trtmt_Day * protegens +
## protegens * community_expected_mu
## Data: data_subset
##
## AIC BIC logLik deviance df.resid
## -1461.6 -1403.3 746.8 -1493.6 266
##
##
## Dispersion parameter for lognormal family (): 1.58
##
## Conditional model:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.24192 0.79020 -0.306 0.75949
## CommRich 0.89015 0.09009 9.881 < 2e-16 ***
## Heatcontrol 2.86009 0.31085 9.201 < 2e-16 ***
## Trtmt_Dayrecov_2 0.52184 0.38211 1.366 0.17204
## Trtmt_Dayresist 0.96973 0.36672 2.644 0.00819 **
## protegens -9.17281 1.28869 -7.118 1.10e-12 ***
## community_expected_mu -4.69674 0.83311 -5.638 1.72e-08 ***
## Heatcontrol:Trtmt_Dayrecov_2 -0.56773 0.41311 -1.374 0.16935
## Heatcontrol:Trtmt_Dayresist -0.66119 0.39557 -1.672 0.09462 .
## Heatcontrol:protegens -2.20145 0.44028 -5.000 5.73e-07 ***
## Trtmt_Dayrecov_2:protegens -0.18892 0.48935 -0.386 0.69945
## Trtmt_Dayresist:protegens -0.92594 0.47977 -1.930 0.05361 .
## protegens:community_expected_mu 8.80168 1.37591 6.397 1.58e-10 ***
## Heatcontrol:Trtmt_Dayrecov_2:protegens -0.41540 0.61712 -0.673 0.50087
## Heatcontrol:Trtmt_Dayresist:protegens 0.72654 0.58996 1.232 0.21813
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# print a summary table of the model fits
data.frame(pars = sapply(div_mods12h, npar_of_glmmTMB_fit),
AIC = sapply(div_mods12h, AIC),
AICc = sapply(div_mods12h, AICc),
BIC = sapply(div_mods12h, BIC)) %>%
mutate(dAIC = min(AIC)-AIC,
dAICc = min(AICc)-AICc,
dBIC = min(BIC)-BIC) %>% arrange(BIC)
# print some of the top model predictions
print(plot_model_pred.CommRich(mod_list=div_mods12h, mod_name="*prot"))
print(plot_model_pred.MU(mod_list=div_mods12h, mod_name="*prot + prot*mu"))
print(plot_model_pred.MU(mod_list=div_mods12h, mod_name="*prot*mu"))
####################
# 24h heat duration
####################
# grab just the treatment with its associated control data
diversity_24h <- rbind(diversity_forFit %>% filter(Heat == "24", Day > 1),
diversity_forFit %>% filter(Heat == "control", Day > 1, Day != 5))
# create a column for last day of heat, first day of recovery, and last day of recovery
diversity_24h$Trtmt_Day <- "resist"
diversity_24h$Trtmt_Day[diversity_24h$Day == 3] <- "recov_1"
diversity_24h$Trtmt_Day[diversity_24h$Day == 4] <- "recov_2"
# fit different models:
div_mods24h <- fit_diversity_models(diversity_24h)
# check the simplest possible models for multicolinearity
check_collinearity(div_mods24h[["simple"]])
check_collinearity(div_mods24h[["simple resist"]])
# check the fit of the overall preferred model
simulateResiduals(fittedModel = div_mods24h[["*prot + prot*mu"]], plot = TRUE)
## Object of Class DHARMa with simulated residuals based on 250 simulations with refit = FALSE . See ?DHARMa::simulateResiduals for help.
##
## Scaled residual values: 0.848 0.46 0.116 0.808 0.356 0.788 0.304 0.328 0.2 0.872 0.076 0.152 0.164 0 0.932 0.364 0.744 0.232 0.98 0.956 ...
summary(div_mods24h[["*prot + prot*mu"]])
## Family: lognormal ( log )
## Formula:
## Diversity_scalePLUSepsilon ~ CommRich + Heat * Trtmt_Day * protegens +
## protegens * community_expected_mu
## Data: data_subset
##
## AIC BIC logLik deviance df.resid
## -998.3 -940.7 515.2 -1030.3 254
##
##
## Dispersion parameter for lognormal family (): 1.35
##
## Conditional model:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.93414 0.81224 -2.381 0.01725 *
## CommRich 0.89152 0.07867 11.333 < 2e-16 ***
## Heatcontrol 2.28475 0.27144 8.417 < 2e-16 ***
## Trtmt_Dayrecov_2 -1.19430 0.38527 -3.100 0.00194 **
## Trtmt_Dayresist 1.99389 0.28818 6.919 4.55e-12 ***
## protegens -5.22837 1.23774 -4.224 2.40e-05 ***
## community_expected_mu -2.11575 0.83966 -2.520 0.01174 *
## Heatcontrol:Trtmt_Dayrecov_2 1.11271 0.41613 2.674 0.00750 **
## Heatcontrol:Trtmt_Dayresist -1.57427 0.30765 -5.117 3.10e-07 ***
## Heatcontrol:protegens -3.49509 0.38270 -9.133 < 2e-16 ***
## Trtmt_Dayrecov_2:protegens 0.32389 0.46735 0.693 0.48829
## Trtmt_Dayresist:protegens -3.56825 0.40387 -8.835 < 2e-16 ***
## protegens:community_expected_mu 5.70594 1.28980 4.424 9.69e-06 ***
## Heatcontrol:Trtmt_Dayrecov_2:protegens -0.89510 0.59839 -1.496 0.13469
## Heatcontrol:Trtmt_Dayresist:protegens 3.25067 0.51526 6.309 2.81e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# print a summary table of the model fits
data.frame(pars = sapply(div_mods24h, npar_of_glmmTMB_fit),
AIC = sapply(div_mods24h, AIC),
AICc = sapply(div_mods24h, AICc),
BIC = sapply(div_mods24h, BIC)) %>%
mutate(dAIC = min(AIC)-AIC,
dAICc = min(AICc)-AICc,
dBIC = min(BIC)-BIC) %>% arrange(BIC)
# print some of the top model predictions
print(plot_model_pred.CommRich(mod_list=div_mods24h, mod_name="*prot"))
print(plot_model_pred.MU(mod_list=div_mods24h, mod_name="*prot + prot*mu"))
print(plot_model_pred.MU(mod_list=div_mods24h, mod_name="*prot*mu"))
####################
# 48h heat duration
####################
# grab just the treatment with its associated control data
diversity_48h <- rbind(diversity_forFit %>% filter(Heat == "48", Day > 2),
diversity_forFit %>% filter(Heat == "control", Day > 2))
# create a column for last day of heat, first day of recovery, and last day of recovery
diversity_48h$Trtmt_Day <- "resist"
diversity_48h$Trtmt_Day[diversity_48h$Day == 4] <- "recov_1"
diversity_48h$Trtmt_Day[diversity_48h$Day == 5] <- "recov_2"
# fit different models:
div_mods48h <- fit_diversity_models(diversity_48h)
## dropping columns from rank-deficient conditional model: Heatcontrol:Trtmt_Dayresist:protegens:community_expected_mu
# check the simplest possible models for multicolinearity
check_collinearity(div_mods48h[["simple"]])
check_collinearity(div_mods48h[["simple resist"]])
# check the fit of the overall preferred model
simulateResiduals(fittedModel = div_mods48h[["*prot + prot*mu"]], plot = TRUE)
## Object of Class DHARMa with simulated residuals based on 250 simulations with refit = FALSE . See ?DHARMa::simulateResiduals for help.
##
## Scaled residual values: 0.504 0.468 0.484 0.628 0.368 0.312 0.172 0.572 0.492 0.552 0.412 0.5 0.468 0.4 0.456 0.2 0.24 0.312 0.312 0.904 ...
summary(div_mods48h[["*prot + prot*mu"]])
## Family: lognormal ( log )
## Formula:
## Diversity_scalePLUSepsilon ~ CommRich + Heat * Trtmt_Day * protegens +
## protegens * community_expected_mu
## Data: data_subset
##
## AIC BIC logLik deviance df.resid
## -1801.9 -1746.8 917.0 -1833.9 216
##
##
## Dispersion parameter for lognormal family (): 1.21
##
## Conditional model:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -3.618e+00 8.944e-01 -4.045 5.24e-05
## CommRich 7.245e-01 9.371e-02 7.731 1.07e-14
## Heatcontrol 4.922e+00 3.864e-01 12.739 < 2e-16
## Trtmt_Dayrecov_2 -9.450e-07 5.210e-01 0.000 1.00000
## Trtmt_Dayresist -5.184e-01 1.529e+00 -0.339 0.73463
## protegens -3.822e+00 1.453e+00 -2.631 0.00851
## community_expected_mu -2.871e+00 9.902e-01 -2.900 0.00373
## Heatcontrol:Trtmt_Dayrecov_2 1.838e-01 5.434e-01 0.338 0.73519
## Heatcontrol:Trtmt_Dayresist 5.907e-01 1.537e+00 0.384 0.70077
## Heatcontrol:protegens -4.332e+00 5.327e-01 -8.133 4.20e-16
## Trtmt_Dayrecov_2:protegens 1.906e-01 6.327e-01 0.301 0.76318
## Trtmt_Dayresist:protegens 1.571e-01 1.618e+00 0.097 0.92266
## protegens:community_expected_mu 4.490e+00 1.554e+00 2.890 0.00386
## Heatcontrol:Trtmt_Dayrecov_2:protegens 2.414e-01 7.330e-01 0.329 0.74195
## Heatcontrol:Trtmt_Dayresist:protegens 4.047e-01 1.661e+00 0.244 0.80748
##
## (Intercept) ***
## CommRich ***
## Heatcontrol ***
## Trtmt_Dayrecov_2
## Trtmt_Dayresist
## protegens **
## community_expected_mu **
## Heatcontrol:Trtmt_Dayrecov_2
## Heatcontrol:Trtmt_Dayresist
## Heatcontrol:protegens ***
## Trtmt_Dayrecov_2:protegens
## Trtmt_Dayresist:protegens
## protegens:community_expected_mu **
## Heatcontrol:Trtmt_Dayrecov_2:protegens
## Heatcontrol:Trtmt_Dayresist:protegens
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# print a summary table of the model fits
data.frame(pars = sapply(div_mods48h, npar_of_glmmTMB_fit),
AIC = sapply(div_mods48h, AIC),
AICc = sapply(div_mods48h, AICc),
BIC = sapply(div_mods48h, BIC)) %>%
mutate(dAIC = min(AIC)-AIC,
dAICc = min(AICc)-AICc,
dBIC = min(BIC)-BIC) %>% arrange(BIC)
# print some of the top model predictions
print(plot_model_pred.CommRich(mod_list=div_mods48h, mod_name="*prot"))
print(plot_model_pred.MU(mod_list=div_mods48h, mod_name="*prot + prot*mu"))
print(plot_model_pred.MU(mod_list=div_mods48h, mod_name="*prot +mu"))
####################
# Select the best model across all data subsets
####################
# for each information criterion, get the average across all data subsets
meanIC <- data.frame(pars = sapply(div_mods48h, npar_of_glmmTMB_fit),
AIC = sapply(div_mods6h, AIC) + sapply(div_mods12h, AIC) + sapply(div_mods24h, AIC) + sapply(div_mods48h, AIC),
AICc = sapply(div_mods6h, AICc) + sapply(div_mods12h, AICc) + sapply(div_mods24h, AICc) + sapply(div_mods48h, AICc),
BIC = sapply(div_mods6h, BIC) + sapply(div_mods12h, BIC) + sapply(div_mods24h, BIC) + sapply(div_mods48h, BIC)) %>%
mutate(AIC = AIC/4,
AICc = AICc/4,
BIC = BIC/4) %>%
mutate(dAIC = min(AIC)-AIC,
dAICc = min(AICc)-AICc,
dBIC = min(BIC)-BIC)
meanIC %>% arrange(BIC)
# clean up
rm(meanIC)
As expected, inoculated community richness has a positive effect on diversity, protegens has a negative effect, and community growth rate has a negative effect. There are interactions with presence/absence of heat but the trouble is that we don’t want the model to be overly complex (see model predicts for the most complex model above; the predictions are just rubbish).
The short heat duration data tends to prefer the community growth rate as an interaction effect with heat, especially for 12h where slow communities had much more diversity in the presence of heat for some reason. As the heat duration gets longer, the data tends to prefer protegens as an interaction effect with heat, especially for 48h where protegens presence now shows more diversity than absence (i.e., because there are extinction events in the absence of protegens).
From the model predictions, we can see why it really doesn’t make
sense to use the very complex model (*prot*mu). I’m going
to use the *prot + prot*mu model for the analysis below
because this model includes all the predictors of interest, is not
unnecessarily complex, it’s the 2nd best fit for each data subset, and
it’s the 2nd best fit across the entire data (after the too complex
*prot*mu).
Even if I wanted to calculate effect sizes for the most complex model, I wouldn’t be able to do it because there’s too many NA values during resistance at 48h heat. This leads to nonest values:
emmeans(div_mods48h[["*prot*mu"]],
~ Heat | CommRich + Trtmt_Day*protegens*community_expected_mu,
data = diversity_48h, type = "response")
## CommRich = 2.52, Trtmt_Day = recov_1, protegens = 0, community_expected_mu = 0.89:
## Heat response SE df asymp.LCL asymp.UCL
## 48 0.0131 0.00504 Inf 0.00620 0.0279
## control 1.5843 0.25000 Inf 1.16234 2.1593
##
## CommRich = 2.52, Trtmt_Day = recov_2, protegens = 0, community_expected_mu = 0.89:
## Heat response SE df asymp.LCL asymp.UCL
## 48 0.0131 0.00504 Inf 0.00620 0.0279
## control 2.1893 0.27700 Inf 1.70785 2.8066
##
## CommRich = 2.52, Trtmt_Day = resist, protegens = 0, community_expected_mu = 0.89:
## Heat response SE df asymp.LCL asymp.UCL
## 48 nonEst NA NA NA NA
## control 1.7680 0.24600 Inf 1.34670 2.3212
##
## CommRich = 2.52, Trtmt_Day = recov_1, protegens = 1, community_expected_mu = 0.89:
## Heat response SE df asymp.LCL asymp.UCL
## 48 0.0145 0.00403 Inf 0.00845 0.0251
## control 0.0263 0.00742 Inf 0.01513 0.0457
##
## CommRich = 2.52, Trtmt_Day = recov_2, protegens = 1, community_expected_mu = 0.89:
## Heat response SE df asymp.LCL asymp.UCL
## 48 0.0173 0.00473 Inf 0.01011 0.0296
## control 0.0489 0.01180 Inf 0.03045 0.0786
##
## CommRich = 2.52, Trtmt_Day = resist, protegens = 1, community_expected_mu = 0.89:
## Heat response SE df asymp.LCL asymp.UCL
## 48 0.0103 0.00489 Inf 0.00407 0.0261
## control 0.0491 0.01180 Inf 0.03060 0.0788
##
## Confidence level used: 0.95
## Intervals are back-transformed from the log scale
# use emmeans to get the effect size during heat as compared to control for each of the treatment days AND conditional on protegens
emm_6h <- emmeans(div_mods6h[["*prot + prot*mu"]],
~ Heat | CommRich + Trtmt_Day*protegens + community_expected_mu*protegens,
data = diversity_6h, type = "response")
effect_6h <- eff_size(emm_6h, sigma(div_mods6h[["*prot + prot*mu"]]),
edf = df.residual(div_mods6h[["*prot + prot*mu"]]))
emm_12h <- emmeans(div_mods12h[["*prot + prot*mu"]],
~ Heat | CommRich + Trtmt_Day*protegens + community_expected_mu*protegens,
data = diversity_12h, type = "response")
effect_12h <- eff_size(emm_12h, sigma(div_mods12h[["*prot + prot*mu"]]),
edf = df.residual(div_mods12h[["*prot + prot*mu"]]))
emm_24h <- emmeans(div_mods24h[["*prot + prot*mu"]],
~ Heat | CommRich + Trtmt_Day*protegens + community_expected_mu*protegens,
data = diversity_24h, type = "response")
effect_24h <- eff_size(emm_24h, sigma(div_mods24h[["*prot + prot*mu"]]),
edf = df.residual(div_mods24h[["*prot + prot*mu"]]))
emm_48h <- emmeans(div_mods48h[["*prot + prot*mu"]],
~ Heat | CommRich + Trtmt_Day*protegens + community_expected_mu*protegens,
data = diversity_48h, type = "response")
effect_48h <- eff_size(emm_48h, sigma(div_mods48h[["*prot + prot*mu"]]),
edf = df.residual(div_mods48h[["*prot + prot*mu"]]))
# a function that extracts the confidence intervals from eff_size *** contingent on protegens ***
get_effsize_CIs <- function(eff_size_object, heat_trtmt) {
data.frame(Heat = heat_trtmt,
CommRich = confint(eff_size_object)[[2]],
Trtmt_Day = confint(eff_size_object)[[3]],
protegens = confint(eff_size_object)[[4]],
expected_mu = confint(eff_size_object)[[5]], #
est = confint(eff_size_object)[[6]], #[[5]],
loCI = confint(eff_size_object)[[9]], #[[8]],
hiCI = confint(eff_size_object)[[10]]) #[[9]])
}
# create a data.frame for plotting marginal effect sizes using a forest plot with the group labels
div_effects_protegens <- data.frame()
div_effects_protegens <- rbind(div_effects_protegens,
get_effsize_CIs(effect_6h, heat_trtmt = 6),
get_effsize_CIs(effect_12h, heat_trtmt = 12),
get_effsize_CIs(effect_24h, heat_trtmt = 24),
get_effsize_CIs(effect_48h, heat_trtmt = 48))
# re-order the levels of Trtmt_Day to go from resistance to recovery then rename them for nice plotting
div_effects_protegens$Trtmt_Day <- factor(div_effects_protegens$Trtmt_Day,
levels = c("resist", "recov_1", "recov_2"))
levels(div_effects_protegens$Trtmt_Day) <- c("Resistance", "Early Recovery", "Late Recovery")
# plot conditional part of the model
ggplot(div_effects_protegens,
aes(x = est, y = as.factor(Heat), colour = Trtmt_Day, shape = as.logical(protegens))) +
geom_vline(xintercept = 0, colour="darkgrey") +
geom_point(position = position_dodge(width = 0.5)) +
geom_errorbarh(position = position_dodge(width = 0.5),
aes(xmin = loCI, xmax = hiCI), height = 0.1) +
scale_colour_manual(values=trtmt_pal) +
labs(title = "CommRich + Heat*Trtmt_Day*prot + com_expect_mu*prot",
x = "Effect Size on Shannon Diversity",
shape = "protegens\npresent?",
y="Heat duration")
# we can do a posthoc on this to illustrate statistically significant effects
posthocPROT_6h <- emmeans(effect_6h, pairwise ~ Trtmt_Day*protegens, data = diversity_6h)
## NOTE: Results may be misleading due to involvement in interactions
posthocPROT_12h <- emmeans(effect_12h, pairwise ~ Trtmt_Day*protegens, data = diversity_12h)
## NOTE: Results may be misleading due to involvement in interactions
posthocPROT_24h <- emmeans(effect_24h, pairwise ~ Trtmt_Day*protegens, data = diversity_24h)
## NOTE: Results may be misleading due to involvement in interactions
posthocPROT_48h <- emmeans(effect_48h, pairwise ~ Trtmt_Day*protegens, data = diversity_48h)
## NOTE: Results may be misleading due to involvement in interactions
# a function that extracts the confidence intervals from a post-hoc object
get_posthoc <- function(posthoc_object, heat_trtmt) {
output <- multcomp::cld(posthoc_object, alpha=0.05/4, Letters = letters) %>%
data.frame() %>%
select(-df)
colnames(output)[3:7] <- c("est", "SE", "loCI", "hiCI", "groups")
output$Heat <- heat_trtmt
return(output)
}
# create a data.frame for plotting
div_effects_protegens <- data.frame()
div_effects_protegens <- rbind(div_effects_protegens,
get_posthoc(posthocPROT_6h, heat_trtmt = 6),
get_posthoc(posthocPROT_12h, heat_trtmt = 12),
get_posthoc(posthocPROT_24h, heat_trtmt = 24),
get_posthoc(posthocPROT_48h, heat_trtmt = 48))
# re-order the levels of Trtmt_Day to go from resistance to recovery then rename them for nice plotting
div_effects_protegens$Trtmt_Day <- factor(div_effects_protegens$Trtmt_Day,
levels = c("resist", "recov_1", "recov_2"))
levels(div_effects_protegens$Trtmt_Day) <- c("Resistance", "Early Recovery", "Late Recovery")
# plot with group labels
ggplot(div_effects_protegens,
aes(x = est, y = as.factor(Heat), colour = Trtmt_Day, shape=as.logical(protegens))) +
facet_grid(~protegens) +
geom_vline(xintercept = 0, colour="darkgrey") +
geom_point(position = position_dodge(width = 0.5)) +
geom_errorbarh(position = position_dodge(width = 0.5),
aes(xmin = loCI, xmax = hiCI), height = 0.1) +
geom_text(position = position_dodge(width = 0.5),
aes(x=-2.5, label=groups)) +
scale_colour_manual(values=trtmt_pal) +
labs(x = "Effect Size on Shannon Diversity",
y="Heat duration",
shape = "protegens\npresent?",
title = "CommRich + Heat*Trtmt_Day*prot + com_expect_mu*prot")
# Note that there is a significant interaction between Treatment Day & protegens!
# I think that it may still be okay to average over the effects of protegens because they are not crossed
# anyway we can still average over the effect of protegens
# we can do a posthoc on this to illustrate statistically significant effects
posthoc_6h <- emmeans(effect_6h, pairwise ~ Trtmt_Day, data = diversity_6h)
## NOTE: Results may be misleading due to involvement in interactions
posthoc_12h <- emmeans(effect_12h, pairwise ~ Trtmt_Day, data = diversity_12h)
## NOTE: Results may be misleading due to involvement in interactions
posthoc_24h <- emmeans(effect_24h, pairwise ~ Trtmt_Day, data = diversity_24h)
## NOTE: Results may be misleading due to involvement in interactions
posthoc_48h <- emmeans(effect_48h, pairwise ~ Trtmt_Day, data = diversity_48h)
## NOTE: Results may be misleading due to involvement in interactions
# a function that extracts the confidence intervals from a post-hoc object
## we need to redefine the function because the colnames have changed now
get_posthoc <- function(posthoc_object, heat_trtmt) {
output <- multcomp::cld(posthoc_object, alpha=0.05/4, Letters = letters) %>%
data.frame() %>%
select(-df)
colnames(output)[2:6] <- c("est", "SE", "loCI", "hiCI", "groups")
output$Heat <- heat_trtmt
return(output)
}
# create a data.frame for plotting
div_effects <- data.frame()
div_effects <- rbind(div_effects,
get_posthoc(posthoc_6h, heat_trtmt = 6),
get_posthoc(posthoc_12h, heat_trtmt = 12),
get_posthoc(posthoc_24h, heat_trtmt = 24),
get_posthoc(posthoc_48h, heat_trtmt = 48))
# re-order the levels of Trtmt_Day to go from resistance to recovery then rename them for nice plotting
div_effects$Trtmt_Day <- factor(div_effects$Trtmt_Day,
levels = c("resist", "recov_1", "recov_2"))
levels(div_effects$Trtmt_Day) <- c("Resistance", "Early Recovery", "Late Recovery")
# plot with group labels
ggplot(div_effects,
aes(x = est, y = as.factor(Heat), colour = Trtmt_Day)) +
geom_vline(xintercept = 0, colour="darkgrey") +
geom_point(position = position_dodge(width = 0.5)) +
geom_errorbarh(position = position_dodge(width = 0.5),
aes(xmin = loCI, xmax = hiCI), height = 0.1) +
geom_text(position = position_dodge(width = 0.5),
aes(x=-2.5, label=groups)) +
scale_colour_manual(values=trtmt_pal) +
labs(x = "Effect Size on Shannon Diversity",
y="Heat duration",
title = "protegens as non-focal predictor (i.e., marginalized)")
#######
# finally, we will do a series of pairwise two-tailed t-tests to compare between heat durations
#######
# reminder to myself: I tried this as a series of z-tests and that made things more optimistic (aka LESS conservative). The t-test is indeed the more conservative option among the parametric tests.
# I also looked into whether it's possible to do a Mann-Whitney test (aka Wilcoxon signed-rank test). But, since that is a *non-parametric test*, by definition you would need raw data to run it (i.e., *not* summary statistics). So I'm a bit confused about whether & how to run a non-parametric test...
# a function that approximates the sample size from each data subset
estimate_n <- function(data_subset, CommRich = FALSE) {
if(CommRich == 0) {
# get the number of unique ID's present on recovery day 2 for the heat treatment
# then divide this by 4 as we want to know the average sample size across CommRich
output <- length(unique(data_subset[data_subset$Heat != "control" & data_subset$Trtmt_Day == "recov_2",]$uniqID))/4
}
if(CommRich == 1){ # do the same thing for specified values of CommRich
output <- length(unique(data_subset[data_subset$Heat != "control" & data_subset$Trtmt_Day == "recov_2" & data_subset$CommRich == 1,]$uniqID))/4
}
if(CommRich == 2){
output <- length(unique(data_subset[data_subset$Heat != "control" & data_subset$Trtmt_Day == "recov_2" & data_subset$CommRich == 2,]$uniqID))/4
}
if(CommRich == 3){
output <- length(unique(data_subset[data_subset$Heat != "control" & data_subset$Trtmt_Day == "recov_2" & data_subset$CommRich == 3,]$uniqID))/4
}
if(CommRich == 4){
output <- length(unique(data_subset[data_subset$Heat != "control" & data_subset$Trtmt_Day == "recov_2" & data_subset$CommRich == 4,]$uniqID))/4
}
return(output)
}
# a function that runs two-tailed t-test between row numbers of diversity_effects df
run_ttest <- function(row_x, row_y,
summary_stats_df){
ttest_object <- tsum.test(mean.x = summary_stats_df$est[row_x],
s.x = summary_stats_df$SE[row_x],
n.x = summary_stats_df$n[row_x],
mean.y = summary_stats_df$est[row_y],
s.y = summary_stats_df$SE[row_y],
n.y = summary_stats_df$n[row_y],
alternative="two.sided")
return(data.frame(t_statistic = ttest_object$statistic,
df = ttest_object$parameters,
pvalue = ttest_object$p.value))
}
# estimate the sample sizes
temp <- div_effects # copy the effects to temp
div_effects <- rbind(temp %>% filter(Heat == 6) %>% mutate(n = estimate_n(diversity_6h)),
temp %>% filter(Heat == 12) %>% mutate(n = estimate_n(diversity_12h)),
temp %>% filter(Heat == 24) %>% mutate(n = estimate_n(diversity_24h)),
temp %>% filter(Heat == 48) %>% mutate(n = estimate_n(diversity_48h)))
rm(temp)
# estimate the SD from the SE
div_effects <- div_effects %>% mutate(SD = SE * sqrt(n)) %>%
# re-order by Heat and Trtmt_Day
arrange(Heat, Trtmt_Day)
# all pairwise combinations of comparisons between the same treatment day for different durations
temp <- t(combn(c(1,4,7,10), 2))
combos <- rbind(temp, temp+1, temp+2)
rm(temp)
# loop through all the combinations and do the t-tests
divEffects_ttests <- data.frame()
for(i in 1:nrow(combos)){
divEffects_ttests <- rbind(divEffects_ttests,
run_ttest(row_x = combos[i,1],
row_y = combos[i,2],
summary_stats_df = div_effects))
}
divEffects_ttests$adjusted_p <- p.adjust(divEffects_ttests$pvalue, method = "bonferroni")
divEffects_ttests$Trtmt_Day <- div_effects$Trtmt_Day[combos[,1]]
divEffects_ttests$Heat_1 <- div_effects$Heat[combos[,1]]
divEffects_ttests$Heat_2 <- div_effects$Heat[combos[,2]]
print(divEffects_ttests)
## t_statistic df pvalue adjusted_p Trtmt_Day Heat_1 Heat_2
## t -1.170623 24.24747 2.531273e-01 1.000000e+00 Resistance 6 12
## t1 -10.327820 23.17869 3.811551e-10 6.860792e-09 Resistance 6 24
## t2 9.788123 11.31564 7.291091e-07 1.312396e-05 Resistance 6 48
## t3 -9.253941 22.45402 4.028679e-09 7.251623e-08 Resistance 12 24
## t4 10.112351 11.28167 5.355511e-07 9.639920e-06 Resistance 12 48
## t5 12.591245 11.28156 5.445801e-08 9.802443e-07 Resistance 24 48
## t6 4.317090 24.20276 2.321737e-04 4.179126e-03 Early Recovery 6 12
## t7 -8.380205 22.96627 1.938259e-08 3.488867e-07 Early Recovery 6 24
## t8 17.537802 16.84888 2.958032e-12 5.324457e-11 Early Recovery 6 48
## t9 -12.567165 22.39988 1.267737e-11 2.281927e-10 Early Recovery 12 24
## t10 14.464154 16.90571 5.971506e-11 1.074871e-09 Early Recovery 12 48
## t11 23.273543 16.99798 2.486209e-14 4.475177e-13 Early Recovery 24 48
## t12 -1.928909 24.24804 6.552685e-02 1.000000e+00 Late Recovery 6 12
## t13 2.366998 21.49121 2.738265e-02 4.928876e-01 Late Recovery 6 24
## t14 22.839750 17.67621 1.441230e-14 2.594215e-13 Late Recovery 6 48
## t15 4.097854 20.91181 5.179505e-04 9.323109e-03 Late Recovery 12 24
## t16 24.384022 17.31424 7.558979e-15 1.360616e-13 Late Recovery 12 48
## t17 19.527338 19.61981 2.542227e-14 4.576009e-13 Late Recovery 24 48
# these p-values seem overly optimistic. Use alpha = 1*10^-3
# cleanup
rm(div_mods6h, div_mods12h, div_mods24h, div_mods48h,
emm_6h, emm_12h, emm_24h, emm_48h, effect_6h, effect_12h, effect_24h, effect_48h,
posthocPROT_6h, posthocPROT_12h, posthocPROT_24h, posthocPROT_48h, posthoc_6h, posthoc_12h, posthoc_24h, posthoc_48h,
temp, combos, divEffects_ttests,
div_effects_protegens, div_effects)
## Warning in rm(div_mods6h, div_mods12h, div_mods24h, div_mods48h, emm_6h, :
## object 'temp' not found
We don’t see any significant decoupling here between the effect size during resistance as compared to during recovery.
The main effect that we see is that diversity really drops down a lot for the 48h heat pulse. This is mostly due to loss of the most sensitive species but it can also be driven by extinction of entire replicates. This data includes all replicates, even ones that went extinct altogether. Both monocultures and extinct replicates will have a final Shannon diversity of 0. So let’s repeat the analysis to show that it’s not the extinct wells that are driving the low effect size at 48h duration.
Let’s show that the strong effect at 48h is not due entirely to the presence of the extinct reps. I will re-do the entire analysis above but this time using only the data without the extinct replicates.
# add a column indicating whether the replicate survived
# but first we need to remove $Heat because it's a factor for diversity but numeric for extinctions and cannot be *_joined
tmp_div <- diversity_forFit %>% select(-Heat)
tmp_div <- inner_join(tmp_div,
extinct.df %>% select(uniqID, survived),
by = c("uniqID"))
diversity_forFit$survived <- tmp_div$survived
rm(tmp_div)
# keep just the diversity values that did not go extinct
diversity_forFit <- diversity_forFit %>% filter(survived == 1)
####################
# 6h heat duration
####################
# grab just the treatment with its associated control data
diversity_6h <- rbind(diversity_forFit %>% filter(Heat == "6"),
diversity_forFit %>% filter(Heat == "control", Day < 4))
# create a column for last day of heat, first day of recovery, and last day of recovery
diversity_6h$Trtmt_Day <- "resist"
diversity_6h$Trtmt_Day[diversity_6h$Day == 2] <- "recov_1"
diversity_6h$Trtmt_Day[diversity_6h$Day == 3] <- "recov_2"
# fit different models:
div_mods6h <- fit_diversity_models(diversity_6h)
# check the simplest possible models for multicolinearity
check_collinearity(div_mods6h[["simple"]])
check_collinearity(div_mods6h[["simple resist"]])
# check the fit of the overall preferred model
simulateResiduals(fittedModel = div_mods6h[["*prot + prot*mu"]], plot = TRUE)
## Object of Class DHARMa with simulated residuals based on 250 simulations with refit = FALSE . See ?DHARMa::simulateResiduals for help.
##
## Scaled residual values: 0.06 0.72 0.456 0.956 0.088 0.156 0.768 0.852 0.352 0.892 0.812 0.64 0.248 0.804 0.696 0.728 0.216 0.872 0.284 0.916 ...
summary(div_mods6h[["*prot + prot*mu"]])
## Family: lognormal ( log )
## Formula:
## Diversity_scalePLUSepsilon ~ CommRich + Heat * Trtmt_Day * protegens +
## protegens * community_expected_mu
## Data: data_subset
##
## AIC BIC logLik deviance df.resid
## -1202 -1143 617 -1234 280
##
##
## Dispersion parameter for lognormal family (): 1.21
##
## Conditional model:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.12238 0.65007 -1.727 0.08425 .
## CommRich 0.65806 0.07012 9.384 < 2e-16 ***
## Heatcontrol 1.55744 0.17430 8.935 < 2e-16 ***
## Trtmt_Dayrecov_2 -0.22741 0.20855 -1.090 0.27551
## Trtmt_Dayresist 0.05919 0.19731 0.300 0.76419
## protegens -8.00074 1.08149 -7.398 1.38e-13 ***
## community_expected_mu -1.03511 0.66866 -1.548 0.12161
## Heatcontrol:Trtmt_Dayrecov_2 -0.28588 0.25017 -1.143 0.25314
## Heatcontrol:Trtmt_Dayresist 0.17917 0.22391 0.800 0.42360
## Heatcontrol:protegens -1.00383 0.34733 -2.890 0.00385 **
## Trtmt_Dayrecov_2:protegens 0.15475 0.36833 0.420 0.67438
## Trtmt_Dayresist:protegens 0.10914 0.35737 0.305 0.76006
## protegens:community_expected_mu 5.56391 1.17388 4.740 2.14e-06 ***
## Heatcontrol:Trtmt_Dayrecov_2:protegens 0.23599 0.49838 0.474 0.63584
## Heatcontrol:Trtmt_Dayresist:protegens -0.07657 0.47374 -0.162 0.87160
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# print a summary table of the model fits
data.frame(pars = sapply(div_mods6h, npar_of_glmmTMB_fit),
AIC = sapply(div_mods6h, AIC),
AICc = sapply(div_mods6h, AICc),
BIC = sapply(div_mods6h, BIC)) %>%
mutate(dAIC = min(AIC)-AIC,
dAICc = min(AICc)-AICc,
dBIC = min(BIC)-BIC) %>% arrange(BIC)
# print some of the top model predictions
print(plot_model_pred.CommRich(mod_list=div_mods6h, mod_name="+prot"))
print(plot_model_pred.MU(mod_list=div_mods6h, mod_name="*prot + prot*mu"))
print(plot_model_pred.MU(mod_list=div_mods6h, mod_name="*prot*mu"))
####################
# 12h heat duration
####################
# grab just the treatment with its associated control data
diversity_12h <- rbind(diversity_forFit %>% filter(Heat == "12", Day > 1),
diversity_forFit %>% filter(Heat == "control", Day > 1, Day != 5))
# create a column for last day of heat, first day of recovery, and last day of recovery
diversity_12h$Trtmt_Day <- "resist"
diversity_12h$Trtmt_Day[diversity_12h$Day == 3] <- "recov_1"
diversity_12h$Trtmt_Day[diversity_12h$Day == 4] <- "recov_2"
# fit different models:
div_mods12h <- fit_diversity_models(diversity_12h)
# check the simplest possible models for multicolinearity
check_collinearity(div_mods12h[["simple"]])
check_collinearity(div_mods12h[["simple resist"]])
# check the fit of the overall preferred model
simulateResiduals(fittedModel = div_mods12h[["*prot + prot*mu"]], plot = TRUE)
## qu = 0.75, log(sigma) = -2.030192 : outer Newton did not converge fully.
## Object of Class DHARMa with simulated residuals based on 250 simulations with refit = FALSE . See ?DHARMa::simulateResiduals for help.
##
## Scaled residual values: 0.696 0.028 0.964 0.224 0.628 0.356 0.42 0.992 0.072 0.104 0.252 0.46 0.22 0.02 0.744 0.364 0.82 0.828 0.376 0.208 ...
summary(div_mods12h[["*prot + prot*mu"]])
## Family: lognormal ( log )
## Formula:
## Diversity_scalePLUSepsilon ~ CommRich + Heat * Trtmt_Day * protegens +
## protegens * community_expected_mu
## Data: data_subset
##
## AIC BIC logLik deviance df.resid
## -1461.6 -1403.3 746.8 -1493.6 266
##
##
## Dispersion parameter for lognormal family (): 1.58
##
## Conditional model:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.24192 0.79020 -0.306 0.75949
## CommRich 0.89015 0.09009 9.881 < 2e-16 ***
## Heatcontrol 2.86009 0.31085 9.201 < 2e-16 ***
## Trtmt_Dayrecov_2 0.52184 0.38211 1.366 0.17204
## Trtmt_Dayresist 0.96973 0.36672 2.644 0.00819 **
## protegens -9.17281 1.28869 -7.118 1.10e-12 ***
## community_expected_mu -4.69674 0.83311 -5.638 1.72e-08 ***
## Heatcontrol:Trtmt_Dayrecov_2 -0.56773 0.41311 -1.374 0.16935
## Heatcontrol:Trtmt_Dayresist -0.66119 0.39557 -1.672 0.09462 .
## Heatcontrol:protegens -2.20145 0.44028 -5.000 5.73e-07 ***
## Trtmt_Dayrecov_2:protegens -0.18892 0.48935 -0.386 0.69945
## Trtmt_Dayresist:protegens -0.92594 0.47977 -1.930 0.05361 .
## protegens:community_expected_mu 8.80168 1.37591 6.397 1.58e-10 ***
## Heatcontrol:Trtmt_Dayrecov_2:protegens -0.41540 0.61712 -0.673 0.50087
## Heatcontrol:Trtmt_Dayresist:protegens 0.72654 0.58996 1.232 0.21813
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# print a summary table of the model fits
data.frame(pars = sapply(div_mods12h, npar_of_glmmTMB_fit),
AIC = sapply(div_mods12h, AIC),
AICc = sapply(div_mods12h, AICc),
BIC = sapply(div_mods12h, BIC)) %>%
mutate(dAIC = min(AIC)-AIC,
dAICc = min(AICc)-AICc,
dBIC = min(BIC)-BIC) %>% arrange(BIC)
# print some of the top model predictions
print(plot_model_pred.CommRich(mod_list=div_mods12h, mod_name="*prot"))
print(plot_model_pred.MU(mod_list=div_mods12h, mod_name="*prot + prot*mu"))
print(plot_model_pred.MU(mod_list=div_mods12h, mod_name="*prot*mu"))
####################
# 24h heat duration
####################
# grab just the treatment with its associated control data
diversity_24h <- rbind(diversity_forFit %>% filter(Heat == "24", Day > 1),
diversity_forFit %>% filter(Heat == "control", Day > 1, Day != 5))
# create a column for last day of heat, first day of recovery, and last day of recovery
diversity_24h$Trtmt_Day <- "resist"
diversity_24h$Trtmt_Day[diversity_24h$Day == 3] <- "recov_1"
diversity_24h$Trtmt_Day[diversity_24h$Day == 4] <- "recov_2"
# fit different models:
div_mods24h <- fit_diversity_models(diversity_24h)
# check the simplest possible models for multicolinearity
check_collinearity(div_mods24h[["simple"]])
check_collinearity(div_mods24h[["simple resist"]])
# check the fit of the overall preferred model
simulateResiduals(fittedModel = div_mods24h[["*prot + prot*mu"]], plot = TRUE)
## Object of Class DHARMa with simulated residuals based on 250 simulations with refit = FALSE . See ?DHARMa::simulateResiduals for help.
##
## Scaled residual values: 0.848 0.46 0.116 0.808 0.356 0.788 0.304 0.328 0.2 0.872 0.076 0.152 0.164 0 0.932 0.364 0.744 0.232 0.98 0.956 ...
summary(div_mods24h[["*prot + prot*mu"]])
## Family: lognormal ( log )
## Formula:
## Diversity_scalePLUSepsilon ~ CommRich + Heat * Trtmt_Day * protegens +
## protegens * community_expected_mu
## Data: data_subset
##
## AIC BIC logLik deviance df.resid
## -998.3 -940.7 515.2 -1030.3 254
##
##
## Dispersion parameter for lognormal family (): 1.35
##
## Conditional model:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.93414 0.81224 -2.381 0.01725 *
## CommRich 0.89152 0.07867 11.333 < 2e-16 ***
## Heatcontrol 2.28475 0.27144 8.417 < 2e-16 ***
## Trtmt_Dayrecov_2 -1.19430 0.38527 -3.100 0.00194 **
## Trtmt_Dayresist 1.99389 0.28818 6.919 4.55e-12 ***
## protegens -5.22837 1.23774 -4.224 2.40e-05 ***
## community_expected_mu -2.11575 0.83966 -2.520 0.01174 *
## Heatcontrol:Trtmt_Dayrecov_2 1.11271 0.41613 2.674 0.00750 **
## Heatcontrol:Trtmt_Dayresist -1.57427 0.30765 -5.117 3.10e-07 ***
## Heatcontrol:protegens -3.49509 0.38270 -9.133 < 2e-16 ***
## Trtmt_Dayrecov_2:protegens 0.32389 0.46735 0.693 0.48829
## Trtmt_Dayresist:protegens -3.56825 0.40387 -8.835 < 2e-16 ***
## protegens:community_expected_mu 5.70594 1.28980 4.424 9.69e-06 ***
## Heatcontrol:Trtmt_Dayrecov_2:protegens -0.89510 0.59839 -1.496 0.13469
## Heatcontrol:Trtmt_Dayresist:protegens 3.25067 0.51526 6.309 2.81e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# print a summary table of the model fits
data.frame(pars = sapply(div_mods24h, npar_of_glmmTMB_fit),
AIC = sapply(div_mods24h, AIC),
AICc = sapply(div_mods24h, AICc),
BIC = sapply(div_mods24h, BIC)) %>%
mutate(dAIC = min(AIC)-AIC,
dAICc = min(AICc)-AICc,
dBIC = min(BIC)-BIC) %>% arrange(BIC)
# print some of the top model predictions
print(plot_model_pred.CommRich(mod_list=div_mods24h, mod_name="*prot"))
print(plot_model_pred.MU(mod_list=div_mods24h, mod_name="*prot + prot*mu"))
print(plot_model_pred.MU(mod_list=div_mods24h, mod_name="*prot*mu"))
####################
# 48h heat duration
####################
# grab just the treatment with its associated control data
diversity_48h <- rbind(diversity_forFit %>% filter(Heat == "48", Day > 2),
diversity_forFit %>% filter(Heat == "control", Day > 2))
# create a column for last day of heat, first day of recovery, and last day of recovery
diversity_48h$Trtmt_Day <- "resist"
diversity_48h$Trtmt_Day[diversity_48h$Day == 4] <- "recov_1"
diversity_48h$Trtmt_Day[diversity_48h$Day == 5] <- "recov_2"
# fit different models:
div_mods48h <- fit_diversity_models(diversity_48h)
## dropping columns from rank-deficient conditional model: Heatcontrol:Trtmt_Dayresist:protegens:community_expected_mu
# check the simplest possible models for multicolinearity
check_collinearity(div_mods48h[["simple"]])
check_collinearity(div_mods48h[["simple resist"]])
# check the fit of the overall preferred model
simulateResiduals(fittedModel = div_mods48h[["*prot + prot*mu"]], plot = TRUE)
## Object of Class DHARMa with simulated residuals based on 250 simulations with refit = FALSE . See ?DHARMa::simulateResiduals for help.
##
## Scaled residual values: 0.48 0.532 0.572 0.604 0.328 0.312 0.176 0.54 0.42 0.5 0.416 0.352 0.504 0.492 0.516 0.18 0.3 0.332 0.904 0.44 ...
summary(div_mods48h[["*prot + prot*mu"]])
## Family: lognormal ( log )
## Formula:
## Diversity_scalePLUSepsilon ~ CommRich + Heat * Trtmt_Day * protegens +
## protegens * community_expected_mu
## Data: data_subset
##
## AIC BIC logLik deviance df.resid
## -1563.0 -1509.0 797.5 -1595.0 200
##
##
## Dispersion parameter for lognormal family (): 1.22
##
## Conditional model:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -3.173e+00 1.016e+00 -3.122 0.00180
## CommRich 7.391e-01 9.482e-02 7.795 6.44e-15
## Heatcontrol 4.656e+00 5.362e-01 8.683 < 2e-16
## Trtmt_Dayrecov_2 1.496e-06 7.377e-01 0.000 1.00000
## Trtmt_Dayresist -7.944e-01 1.576e+00 -0.504 0.61415
## protegens -4.320e+00 1.550e+00 -2.788 0.00531
## community_expected_mu -3.113e+00 1.006e+00 -3.094 0.00197
## Heatcontrol:Trtmt_Dayrecov_2 1.705e-01 7.536e-01 0.226 0.82104
## Heatcontrol:Trtmt_Dayresist 8.643e-01 1.583e+00 0.546 0.58513
## Heatcontrol:protegens -4.065e+00 6.496e-01 -6.259 3.89e-10
## Trtmt_Dayrecov_2:protegens 1.907e-01 8.204e-01 0.232 0.81619
## Trtmt_Dayresist:protegens 4.327e-01 1.663e+00 0.260 0.79464
## protegens:community_expected_mu 4.751e+00 1.570e+00 3.026 0.00248
## Heatcontrol:Trtmt_Dayrecov_2:protegens 2.528e-01 9.000e-01 0.281 0.77882
## Heatcontrol:Trtmt_Dayresist:protegens 1.329e-01 1.703e+00 0.078 0.93780
##
## (Intercept) **
## CommRich ***
## Heatcontrol ***
## Trtmt_Dayrecov_2
## Trtmt_Dayresist
## protegens **
## community_expected_mu **
## Heatcontrol:Trtmt_Dayrecov_2
## Heatcontrol:Trtmt_Dayresist
## Heatcontrol:protegens ***
## Trtmt_Dayrecov_2:protegens
## Trtmt_Dayresist:protegens
## protegens:community_expected_mu **
## Heatcontrol:Trtmt_Dayrecov_2:protegens
## Heatcontrol:Trtmt_Dayresist:protegens
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# print a summary table of the model fits
data.frame(pars = sapply(div_mods48h, npar_of_glmmTMB_fit),
AIC = sapply(div_mods48h, AIC),
AICc = sapply(div_mods48h, AICc),
BIC = sapply(div_mods48h, BIC)) %>%
mutate(dAIC = min(AIC)-AIC,
dAICc = min(AICc)-AICc,
dBIC = min(BIC)-BIC) %>% arrange(BIC)
# print some of the top model predictions
print(plot_model_pred.CommRich(mod_list=div_mods48h, mod_name="*prot"))
print(plot_model_pred.MU(mod_list=div_mods48h, mod_name="*prot + prot*mu"))
print(plot_model_pred.MU(mod_list=div_mods48h, mod_name="*prot +mu"))
####################
# Select the best model across all data subsets
####################
# for each information criterion, get the average across all data subsets
meanIC <- data.frame(pars = sapply(div_mods48h, npar_of_glmmTMB_fit),
AIC = sapply(div_mods6h, AIC) + sapply(div_mods12h, AIC) + sapply(div_mods24h, AIC) + sapply(div_mods48h, AIC),
AICc = sapply(div_mods6h, AICc) + sapply(div_mods12h, AICc) + sapply(div_mods24h, AICc) + sapply(div_mods48h, AICc),
BIC = sapply(div_mods6h, BIC) + sapply(div_mods12h, BIC) + sapply(div_mods24h, BIC) + sapply(div_mods48h, BIC)) %>%
mutate(AIC = AIC/4,
AICc = AICc/4,
BIC = BIC/4) %>%
mutate(dAIC = min(AIC)-AIC,
dAICc = min(AICc)-AICc,
dBIC = min(BIC)-BIC)
meanIC %>% arrange(BIC)
# clean up
rm(meanIC)
####################
# get the emmeans
####################
# use emmeans to get the effect size during heat as compared to control for each of the treatment days AND conditional on protegens
emm_6h <- emmeans(div_mods6h[["*prot + prot*mu"]],
~ Heat | CommRich + Trtmt_Day*protegens + community_expected_mu*protegens,
data = diversity_6h, type = "response")
effect_6h <- eff_size(emm_6h, sigma(div_mods6h[["*prot + prot*mu"]]), edf = df.residual(div_mods6h[["*prot + prot*mu"]]))
emm_12h <- emmeans(div_mods12h[["*prot + prot*mu"]],
~ Heat | CommRich + Trtmt_Day*protegens + community_expected_mu*protegens,
data = diversity_12h, type = "response")
effect_12h <- eff_size(emm_12h, sigma(div_mods12h[["*prot + prot*mu"]]), edf = df.residual(div_mods12h[["*prot + prot*mu"]]))
emm_24h <- emmeans(div_mods24h[["*prot + prot*mu"]],
~ Heat | CommRich + Trtmt_Day*protegens + community_expected_mu*protegens,
data = diversity_24h, type = "response")
effect_24h <- eff_size(emm_24h, sigma(div_mods24h[["*prot + prot*mu"]]), edf = df.residual(div_mods24h[["*prot + prot*mu"]]))
emm_48h <- emmeans(div_mods48h[["*prot + prot*mu"]],
~ Heat | CommRich + Trtmt_Day*protegens + community_expected_mu*protegens,
data = diversity_48h, type = "response")
effect_48h <- eff_size(emm_48h, sigma(div_mods48h[["*prot + prot*mu"]]), edf = df.residual(div_mods48h[["*prot + prot*mu"]]))
# create a data.frame for plotting marginal effect sizes using a forest plot with the group labels
div_effects_protegens <- data.frame()
div_effects_protegens <- rbind(div_effects_protegens,
get_effsize_CIs(effect_6h, heat_trtmt = 6),
get_effsize_CIs(effect_12h, heat_trtmt = 12),
get_effsize_CIs(effect_24h, heat_trtmt = 24),
get_effsize_CIs(effect_48h, heat_trtmt = 48))
# re-order the levels of Trtmt_Day to go from resistance to recovery then rename them for nice plotting
div_effects_protegens$Trtmt_Day <- factor(div_effects_protegens$Trtmt_Day,
levels = c("resist", "recov_1", "recov_2"))
levels(div_effects_protegens$Trtmt_Day) <- c("Resistance", "Early Recovery", "LAte Recovery")
# plot conditional part of the model
ggplot(div_effects_protegens,
aes(x = est, y = as.factor(Heat), colour = Trtmt_Day, shape = as.logical(protegens))) +
geom_vline(xintercept = 0, colour="darkgrey") +
geom_point(position = position_dodge(width = 0.5)) +
geom_errorbarh(position = position_dodge(width = 0.5),
aes(xmin = loCI, xmax = hiCI), height = 0.1) +
scale_colour_manual(values=trtmt_pal) +
labs(title = "Extinct reps removed!",
x = "Effect Size on Shannon Diversity",
shape = "protegens\npresent?",
y="Heat duration")
# anyway we can still average over the effect of protegens
# we can do a posthoc on this to illustrate statistically significant effects
posthoc_6h <- emmeans(effect_6h, pairwise ~ Trtmt_Day, data = diversity_6h)
## NOTE: Results may be misleading due to involvement in interactions
posthoc_12h <- emmeans(effect_12h, pairwise ~ Trtmt_Day, data = diversity_12h)
## NOTE: Results may be misleading due to involvement in interactions
posthoc_24h <- emmeans(effect_24h, pairwise ~ Trtmt_Day, data = diversity_24h)
## NOTE: Results may be misleading due to involvement in interactions
posthoc_48h <- emmeans(effect_48h, pairwise ~ Trtmt_Day, data = diversity_48h)
## NOTE: Results may be misleading due to involvement in interactions
# create a data.frame for plotting
div_effects <- data.frame()
div_effects <- rbind(div_effects,
get_posthoc(posthoc_6h, heat_trtmt = 6),
get_posthoc(posthoc_12h, heat_trtmt = 12),
get_posthoc(posthoc_24h, heat_trtmt = 24),
get_posthoc(posthoc_48h, heat_trtmt = 48))
# re-order the levels of Trtmt_Day to go from resistance to recovery then rename them for nice plotting
div_effects$Trtmt_Day <- factor(div_effects$Trtmt_Day,
levels = c("resist", "recov_1", "recov_2"))
levels(div_effects$Trtmt_Day) <- c("Resistance", "Early Recovery", "Late Recovery")
# plot with group labels
ggplot(div_effects,
aes(x = est, y = as.factor(Heat), colour = Trtmt_Day)) +
geom_vline(xintercept = 0, colour="darkgrey") +
geom_point(position = position_dodge(width = 0.5)) +
geom_errorbarh(position = position_dodge(width = 0.5),
aes(xmin = loCI, xmax = hiCI), height = 0.1) +
geom_text(position = position_dodge(width = 0.5),
aes(x=-2.5, label=groups)) +
scale_colour_manual(values=trtmt_pal) +
labs(x = "Effect Size on Shannon Diversity",
y="Heat duration",
title = "protegens as non-focal predictor. Extinct reps removed!")
################################
# Plot figure for main text: Figure 4a
################################
fig4a <- ggplot(div_effects,
aes(x = est, y = as.factor(Heat), colour = Trtmt_Day)) +
geom_vline(xintercept = 0, colour="darkgrey") +
geom_point(position = position_dodge(width = 0.5)) +
geom_errorbarh(position = position_dodge(width = 0.5),
aes(xmin = loCI, xmax = hiCI), height = 0.15) +
scale_x_continuous(limits = c(-4.35, 0.9), expand = c(0,0)) +
scale_colour_manual(values=trtmt_pal) +
labs(x = "Effect Size on Shannon Diversity",
y="Heat Duration (hrs)",
colour = "Treatment\nDay")
print(fig4a)
png(filename="./figures/Fig4_legend.png", width = 3.40, height = 2.90, units = "in", res=300)
print(fig4a)
dev.off()
## png
## 2
png(filename="./figures/Fig4A_plot.png", width = 4.48, height = 2.61, units = "in", res=300)
print(fig4a + theme(legend.position="none"))
dev.off()
## png
## 2
# cleanup
#rm(diversity_forFit, diversity_6h, diversity_12h, diversity_24h, diversity_48h,
# div_mods6h, div_mods12h, div_mods24h, div_mods48h,
# emm_6h, emm_12h, emm_24h, emm_48h, effect_6h, effect_12h, effect_24h, effect_48h,
# posthoc_6h, posthoc_12h, posthoc_24h, posthoc_48h,
# get_effsize_CIs, get_posthoc,
# div_effects_protegens, div_effects,
# plot_model_pred.CommRich, plot_model_pred.MU, fig4a)
The overall results are the same as we found above. This shows that excluding the replicates that went extinct had no impact on the overall results.
How is total community density impacted during and after heat? Let’s first plot it directly to get an idea of what we’re dealing with:
ggplot(absDen_forFit %>% filter(!is.na(Total_density)), # remove NA values
aes(y=Total_density, x=Day, fill=community_expected_mu)) +
facet_grid(protegens~as.factor(Heat)) +
geom_quasirandom(alpha=0.7, pch=21) +
scale_fill_viridis_c(option = "inferno", end=0.85) +
#scale_y_log10(#) +
scale_y_continuous(trans=scales::pseudo_log_trans(base = 10), # this prevents 0's from getting lost
breaks = 10^(-1:3)) +
labs(y = "Productivity",
fill = "Expected\nCommunity mu")
Now we repeat the same type of emmeans analysis as we did for diversity but using the total density (aka a proxy of productivity). In this case I am a priori more comfortable with using Poisson or negative binomial family because the total density is more like counts data.
Remember that total densities below the threshold of detection from wells that DID recover during the recovery phase (i.e., those that did not go extinct) have values of epsilon corresponding to the threshold of detection. (Remaining NA values represent missing data due to pipetting mistakes or clogs during flow cytometry.) Below threshold of detection total density values (i.e., epsilons) make up the majority of observations during resistance for the longest heat duration. See a further discussion in the section below.
# scale the data by its standard deviation
absDen_forFit$TotDensity_scale <- scale(absDen_forFit$Total_density,
scale = sd(absDen_forFit$Total_density, na.rm = TRUE),
center = FALSE)
# the max scaled value is ~7.9 and almost 3% of the data is 0 values
summary(absDen_forFit$TotDensity_scale)
## V1
## Min. :0.0000
## 1st Qu.:0.1041
## Median :0.2425
## Mean :0.6746
## 3rd Qu.:0.8045
## Max. :7.8953
## NA's :9
sum(absDen_forFit$TotDensity_scale == 0) / length(absDen_forFit$TotDensity_scale)
## [1] NA
# in fact, the total density data is even more long-tailed than the diversity data. I guess that makes sense as there is a max value for the possible diversity with 4 species.
hist(absDen_forFit$TotDensity_scale)
# re-arrange the levels so that emmeans can be run:
absDen_forFit$Heat <- as.character(absDen_forFit$Heat)
absDen_forFit$Heat[which(absDen_forFit$Heat == 0)] <- "control"
# !!! emmeans expects the control to be the very *last* level !!!
absDen_forFit$Heat <- factor(absDen_forFit$Heat,
levels = c("6", "12", "24", "48", "control"))
# let's keep CommRich and Day as numeric for now while we look for the best fitting GLM family
# let's compare different GLM families
try_gaussian <- glmmTMB(TotDensity_scale ~ CommRich*Heat*Day*protegens,
data = absDen_forFit,
control = glmmTMBControl(optCtrl = list(iter.max = 10000,eval.max = 10000)))
simulateResiduals(fittedModel = try_gaussian, plot = TRUE)
## Object of Class DHARMa with simulated residuals based on 250 simulations with refit = FALSE . See ?DHARMa::simulateResiduals for help.
##
## Scaled residual values: 0.936 0.828 0.464 0.604 0.104 0.448 0.092 0.328 0.448 0.5 0.616 0.668 0.692 0.144 0.468 0.484 0.828 0.54 0.136 1 ...
try_gamma <- glmmTMB(TotDensity_scale ~ CommRich*Heat*Day*protegens,
data = absDen_forFit,
family = ziGamma,
ziformula = ~1, # this needs to be added because there are 0 values in the data
control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
simulateResiduals(fittedModel = try_gamma, plot = TRUE)
## Object of Class DHARMa with simulated residuals based on 250 simulations with refit = FALSE . See ?DHARMa::simulateResiduals for help.
##
## Scaled residual values: 0.812 0.964 0.5 0.748 0.336 0.532 0.336 0.612 0.46 0.596 0.64 0.78 0.816 0.368 0.508 0.636 0.748 0.664 0.376 0.952 ...
try_lognorm <- glmmTMB(TotDensity_scale ~ CommRich*Heat*Day*protegens,
data = absDen_forFit,
family = lognormal,
ziformula = ~1, # this needs to be added because there are 0 values in the data
control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
simulateResiduals(fittedModel = try_lognorm, plot = TRUE)
## Object of Class DHARMa with simulated residuals based on 250 simulations with refit = FALSE . See ?DHARMa::simulateResiduals for help.
##
## Scaled residual values: 0.836 0.876 0.356 0.612 0.26 0.384 0.176 0.484 0.32 0.436 0.668 0.596 0.708 0.164 0.372 0.472 0.796 0.468 0.272 0.964 ...
try_LOGlognorm <- glmmTMB(log(TotDensity_scale + 1) ~ CommRich*Heat*Day*protegens,
data = absDen_forFit,
family = lognormal,
ziformula = ~1, # I'm keeping this as 0-inflated lognormal alone was already over-dispersed. So I want to see if the log(x+1) transformation sufficiently brings in the long tail.
control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
simulateResiduals(fittedModel = try_LOGlognorm, plot = TRUE)
## Object of Class DHARMa with simulated residuals based on 250 simulations with refit = FALSE . See ?DHARMa::simulateResiduals for help.
##
## Scaled residual values: 0.776 0.912 0.356 0.644 0.26 0.396 0.196 0.48 0.32 0.448 0.624 0.644 0.74 0.192 0.384 0.48 0.776 0.492 0.252 0.928 ...
try_negbinom <- glmmTMB(as.integer(TotDensity_scale * 1000) ~ CommRich*Heat*Day*protegens,
data = absDen_forFit,
family = nbinom2,
control = glmmTMBControl(optCtrl = list(iter.max = 10000,eval.max = 10000)))
simulateResiduals(fittedModel = try_negbinom, plot = TRUE)
## DHARMa:testOutliers with type = binomial may have inflated Type I error rates for integer-valued distributions. To get a more exact result, it is recommended to re-run testOutliers with type = 'bootstrap'. See ?testOutliers for details
## Object of Class DHARMa with simulated residuals based on 250 simulations with refit = FALSE . See ?DHARMa::simulateResiduals for help.
##
## Scaled residual values: 0.776 0.944 0.536 0.784 0.404 0.612 0.456 0.548 0.5558496 0.6970012 0.6674832 0.84 0.848 0.444 0.54 0.6242595 0.748 0.6612216 0.408 0.892 ...
try_negbinom0 <- glmmTMB(as.integer(Total_density * 1000) ~ CommRich*Heat*Day*protegens,
data = absDen_forFit,
family = nbinom2,
ziformula = ~1, # try zero inflated distribution
control = glmmTMBControl(optCtrl = list(iter.max = 10000,eval.max = 10000)))
simulateResiduals(fittedModel = try_negbinom0, plot = TRUE)
## Object of Class DHARMa with simulated residuals based on 250 simulations with refit = FALSE . See ?DHARMa::simulateResiduals for help.
##
## Scaled residual values: 0.752 0.952 0.508 0.76 0.344 0.568 0.364 0.516 0.504 0.676 0.644 0.876 0.872 0.384 0.564 0.64 0.732 0.736 0.404 0.956 ...
try_poisson <- glmmTMB(as.integer(TotDensity_scale * 1000) ~ CommRich*Heat*Day*protegens,
data = absDen_forFit,
family = genpois,
control = glmmTMBControl(optCtrl = list(iter.max = 10000,eval.max = 10000)))
simulateResiduals(fittedModel = try_poisson, plot = TRUE)
## DHARMa:testOutliers with type = binomial may have inflated Type I error rates for integer-valued distributions. To get a more exact result, it is recommended to re-run testOutliers with type = 'bootstrap'. See ?testOutliers for details
## Object of Class DHARMa with simulated residuals based on 250 simulations with refit = FALSE . See ?DHARMa::simulateResiduals for help.
##
## Scaled residual values: 0.788 0.804 0.3376638 0.612 0.244 0.416 0.144 0.392 0.324256 0.496 0.592 0.668 0.6574457 0.188 0.384 0.436 0.756 0.508 0.2 0.924 ...
try_poisson0 <- glmmTMB(as.integer(TotDensity_scale * 1000) ~ CommRich*Heat*Day*protegens,
data = absDen_forFit,
family = genpois,
ziformula = ~1, # try zero inflated distribution
control = glmmTMBControl(optCtrl = list(iter.max = 10000,eval.max = 10000)))
simulateResiduals(fittedModel = try_poisson0, plot = TRUE)
## DHARMa:testOutliers with type = binomial may have inflated Type I error rates for integer-valued distributions. To get a more exact result, it is recommended to re-run testOutliers with type = 'bootstrap'. See ?testOutliers for details
## qu = 0.25, log(sigma) = -2.001813 : outer Newton did not converge fully.
## Object of Class DHARMa with simulated residuals based on 250 simulations with refit = FALSE . See ?DHARMa::simulateResiduals for help.
##
## Scaled residual values: 0.784 0.824 0.384 0.636 0.188 0.3772009 0.232 0.464 0.364 0.4753274 0.612 0.68 0.7 0.208 0.356 0.436 0.748 0.5187042 0.208 0.952 ...
# let's check this with AIC and BIC
AIC(try_gaussian, try_gamma, try_lognorm, try_LOGlognorm,
try_negbinom, try_negbinom0, try_poisson, try_poisson0) %>% arrange(AIC)
BIC(try_gaussian, try_gamma, try_lognorm, try_LOGlognorm,
try_negbinom, try_negbinom0, try_poisson, try_poisson0) %>% arrange(BIC)
# clean up
rm(try_gaussian, try_gamma, try_lognorm, try_LOGlognorm, try_negbinom, try_negbinom0, try_poisson, try_poisson0)
Okay, so let’s go for the Poisson family. Its residuals look a little worse than the log(x+1) transformed lognormal… But I feel really sketched out by the latter model. Whereas the Poisson is the type of family that I might expect to see for count-style data like the Total density.
Now we will do the same thing we did for diversity: split up the data into subsets by heat pulse duration, calculate the best fit information criteria (i.e., AIC and BIC) for each data subset, and get the average across the entire data.
# a function to fit the different models to the subsetted data:
fit_productivity_models <- function(data_subset) {
# create list for output
output.ls <- list()
# this is the simplest model. I'm fitting it to check for colinearity
output.ls[["simple"]] <- glmmTMB(as.integer(TotDensity_scale * 1000) ~ CommRich + Heat + Trtmt_Day + protegens + community_expected_mu,
data = data_subset,
family = genpois,
control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
# this is another simple model to check for colinearity
output.ls[["simple resist"]] <- glmmTMB(as.integer(TotDensity_scale * 1000) ~ CommRich + Heat + Trtmt_Day + community_expected_mu + resistant,
data = data_subset,
family = genpois,
control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
# this is our null model:
output.ls[["H0"]] <- glmmTMB(as.integer(TotDensity_scale * 1000) ~ Heat*Trtmt_Day,
data = data_subset,
family = genpois,
control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
# CommRich as an effect:
output.ls[["+CommRich"]] <- glmmTMB(as.integer(TotDensity_scale * 1000) ~ Heat*Trtmt_Day + CommRich,
data = data_subset,
family = genpois,
control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
output.ls[["*CommRich"]] <- glmmTMB(as.integer(TotDensity_scale * 1000) ~ Heat*Trtmt_Day*CommRich,
data = data_subset,
family = genpois,
control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
# Resistance to 40C as an effect:
output.ls[["+resist"]] <- glmmTMB(as.integer(TotDensity_scale * 1000) ~ Heat*Trtmt_Day + resistant,
data = data_subset,
family = genpois,
control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
output.ls[["*resist"]] <- glmmTMB(as.integer(TotDensity_scale * 1000) ~ Heat*Trtmt_Day*resistant,
data = data_subset,
family = genpois,
control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
# protegens presence as an effect:
output.ls[["+prot"]] <- glmmTMB(as.integer(TotDensity_scale * 1000) ~ Heat*Trtmt_Day + protegens,
data = data_subset,
family = genpois,
control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
output.ls[["*prot"]] <- glmmTMB(as.integer(TotDensity_scale * 1000) ~ Heat*Trtmt_Day*protegens,
data = data_subset,
family = genpois,
control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
# expected community growth rate as an effect:
output.ls[["+mu"]] <- glmmTMB(as.integer(TotDensity_scale * 1000) ~ Heat*Trtmt_Day + community_expected_mu,
data = data_subset,
family = genpois,
control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
output.ls[["*mu"]] <- glmmTMB(as.integer(TotDensity_scale * 1000) ~ Heat*Trtmt_Day*community_expected_mu,
data = data_subset,
family = genpois,
control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
# interactions of CommRich with resistance
output.ls[["+CommRich +resist"]] <- glmmTMB(as.integer(TotDensity_scale * 1000) ~ Heat*Trtmt_Day + CommRich + resistant,
data = data_subset,
family = genpois,
control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
output.ls[["*CommRich +resist"]] <- glmmTMB(as.integer(TotDensity_scale * 1000) ~ Heat*Trtmt_Day*CommRich + resistant,
data = data_subset,
family = genpois,
control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
output.ls[["*CommRich*resist"]] <- glmmTMB(as.integer(TotDensity_scale * 1000) ~ Heat*Trtmt_Day*CommRich*resistant,
data = data_subset,
family = genpois,
control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
output.ls[["*CommRich + CommRich*resist"]] <- glmmTMB(as.integer(TotDensity_scale * 1000) ~ Heat*Trtmt_Day*CommRich + CommRich*resistant,
data = data_subset,
family = genpois,
control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
output.ls[["*resist +CommRich"]] <- glmmTMB(as.integer(TotDensity_scale * 1000) ~ Heat*Trtmt_Day*resistant + CommRich,
data = data_subset,
family = genpois,
control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
output.ls[["*resist + resist*CommRich"]] <- glmmTMB(as.integer(TotDensity_scale * 1000) ~ Heat*Trtmt_Day*resistant + resistant*CommRich,
data = data_subset,
family = genpois,
control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
# interactions of CommRich with protegens
output.ls[["+CommRich +prot"]] <- glmmTMB(as.integer(TotDensity_scale * 1000) ~ Heat*Trtmt_Day + CommRich + protegens,
data = data_subset,
family = genpois,
control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
output.ls[["*CommRich*prot"]] <- glmmTMB(as.integer(TotDensity_scale * 1000) ~ Heat*Trtmt_Day*CommRich*protegens,
data = data_subset,
family = genpois,
control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
output.ls[["*CommRich +prot"]] <- glmmTMB(as.integer(TotDensity_scale * 1000) ~ Heat*Trtmt_Day*CommRich + protegens,
data = data_subset,
family = genpois,
control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
output.ls[["*CommRich + CommRich*prot"]] <- glmmTMB(as.integer(TotDensity_scale * 1000) ~ Heat*Trtmt_Day*CommRich + CommRich*protegens,
data = data_subset,
family = genpois,
control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
output.ls[["*prot +CommRich"]] <- glmmTMB(as.integer(TotDensity_scale * 1000) ~ Heat*Trtmt_Day*protegens + CommRich,
data = data_subset,
family = genpois,
control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
output.ls[["*prot + prot*CommRich"]] <- glmmTMB(as.integer(TotDensity_scale * 1000) ~ Heat*Trtmt_Day*protegens + protegens*CommRich,
data = data_subset,
family = genpois,
control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
# interactions of CommRich with expected community growth rate
output.ls[["+CommRich +mu"]] <- glmmTMB(as.integer(TotDensity_scale * 1000) ~ Heat*Trtmt_Day + CommRich + community_expected_mu,
data = data_subset,
family = genpois,
control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
output.ls[["*CommRich*mu"]] <- glmmTMB(as.integer(TotDensity_scale * 1000) ~ Heat*Trtmt_Day*CommRich*community_expected_mu,
data = data_subset,
family = genpois,
control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
output.ls[["*CommRich +mu"]] <- glmmTMB(as.integer(TotDensity_scale * 1000) ~ Heat*Trtmt_Day*CommRich + community_expected_mu,
data = data_subset,
family = genpois,
control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
output.ls[["*CommRich + CommRich*mu"]] <- glmmTMB(as.integer(TotDensity_scale * 1000) ~ Heat*Trtmt_Day*CommRich + CommRich*community_expected_mu,
data = data_subset,
family = genpois,
control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
output.ls[["*mu +CommRich"]] <- glmmTMB(as.integer(TotDensity_scale * 1000) ~ Heat*Trtmt_Day*community_expected_mu + CommRich,
data = data_subset,
family = genpois,
control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
output.ls[["*mu + mu*CommRich"]] <- glmmTMB(as.integer(TotDensity_scale * 1000) ~ Heat*Trtmt_Day*community_expected_mu + community_expected_mu*CommRich,
data = data_subset,
family = genpois,
control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
# interactions of resistance with expected community growth rate
output.ls[["+resist +mu"]] <- glmmTMB(as.integer(TotDensity_scale * 1000) ~ Heat*Trtmt_Day + resistant + community_expected_mu,
data = data_subset,
family = genpois,
control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
output.ls[["*resist*mu"]] <- glmmTMB(as.integer(TotDensity_scale * 1000) ~ Heat*Trtmt_Day*resistant*community_expected_mu,
data = data_subset,
family = genpois,
control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
output.ls[["*resist +mu"]] <- glmmTMB(as.integer(TotDensity_scale * 1000) ~ Heat*Trtmt_Day*resistant + community_expected_mu,
data = data_subset,
family = genpois,
control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
output.ls[["*resist + resist*mu"]] <- glmmTMB(as.integer(TotDensity_scale * 1000) ~ Heat*Trtmt_Day*resistant + resistant*community_expected_mu,
data = data_subset,
family = genpois,
control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
output.ls[["*mu +resist"]] <- glmmTMB(as.integer(TotDensity_scale * 1000) ~ Heat*Trtmt_Day*community_expected_mu + resistant,
data = data_subset,
family = genpois,
control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
output.ls[["*mu + mu*resist"]] <- glmmTMB(as.integer(TotDensity_scale * 1000) ~ Heat*Trtmt_Day*community_expected_mu + community_expected_mu*resistant,
data = data_subset,
family = genpois,
control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
# interactions of protegens with expected community growth rate
output.ls[["+prot +mu"]] <- glmmTMB(as.integer(TotDensity_scale * 1000) ~ Heat*Trtmt_Day + protegens + community_expected_mu,
data = data_subset,
family = genpois,
control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
output.ls[["*prot*mu"]] <- glmmTMB(as.integer(TotDensity_scale * 1000) ~ Heat*Trtmt_Day*protegens*community_expected_mu,
data = data_subset,
family = genpois,
control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
output.ls[["*prot +mu"]] <- glmmTMB(as.integer(TotDensity_scale * 1000) ~ Heat*Trtmt_Day*protegens + community_expected_mu,
data = data_subset,
family = genpois,
control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
output.ls[["*prot + prot*mu"]] <- glmmTMB(as.integer(TotDensity_scale * 1000) ~ Heat*Trtmt_Day*protegens + protegens*community_expected_mu,
data = data_subset,
family = genpois,
control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
output.ls[["*mu +prot"]] <- glmmTMB(as.integer(TotDensity_scale * 1000) ~ Heat*Trtmt_Day*community_expected_mu + protegens,
data = data_subset,
family = genpois,
control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
output.ls[["*mu + mu*prot"]] <- glmmTMB(as.integer(TotDensity_scale * 1000) ~ Heat*Trtmt_Day*community_expected_mu + community_expected_mu*protegens,
data = data_subset,
family = genpois,
control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
# I need to consider models with even more predictors:
# e.g., with CommRich, mu, and resist
output.ls[["*prot +mu +CommRich"]] <- glmmTMB(as.integer(TotDensity_scale * 1000) ~ Heat*Trtmt_Day*protegens + community_expected_mu + CommRich,
data = data_subset,
family = genpois,
control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
output.ls[["*prot*mu +CommRich"]] <- glmmTMB(as.integer(TotDensity_scale * 1000) ~ Heat*Trtmt_Day*protegens*community_expected_mu + CommRich,
data = data_subset,
family = genpois,
control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
output.ls[["*prot + mu*CommRich"]] <- glmmTMB(as.integer(TotDensity_scale * 1000) ~ Heat*Trtmt_Day*protegens + community_expected_mu*CommRich,
data = data_subset,
family = genpois,
control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
output.ls[["*prot + prot*mu +CommRich"]] <- glmmTMB(as.integer(TotDensity_scale * 1000) ~ Heat*Trtmt_Day*protegens + protegens*community_expected_mu + CommRich,
data = data_subset,
family = genpois,
control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
output.ls[["*prot +mu + prot*CommRich"]] <- glmmTMB(as.integer(TotDensity_scale * 1000) ~ Heat*Trtmt_Day*protegens + community_expected_mu + protegens*CommRich,
data = data_subset,
family = genpois,
control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
output.ls[["*prot*CommRich +mu"]] <- glmmTMB(as.integer(TotDensity_scale * 1000) ~ Heat*Trtmt_Day*protegens*CommRich + community_expected_mu,
data = data_subset,
family = genpois,
control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
output.ls[["*mu +prot +CommRich"]] <- glmmTMB(as.integer(TotDensity_scale * 1000) ~ Heat*Trtmt_Day*community_expected_mu + protegens + CommRich,
data = data_subset,
family = genpois,
control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
output.ls[["*mu + mu*prot +CommRich"]] <- glmmTMB(as.integer(TotDensity_scale * 1000) ~ Heat*Trtmt_Day*community_expected_mu + community_expected_mu*protegens + CommRich,
data = data_subset,
family = genpois,
control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
output.ls[["*mu +prot + mu*CommRich"]] <- glmmTMB(as.integer(TotDensity_scale * 1000) ~ Heat*Trtmt_Day*community_expected_mu + protegens + community_expected_mu*CommRich,
data = data_subset,
family = genpois,
control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
output.ls[["*mu*CommRich +prot"]] <- glmmTMB(as.integer(TotDensity_scale * 1000) ~ Heat*Trtmt_Day*community_expected_mu*CommRich + protegens,
data = data_subset,
family = genpois,
control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
output.ls[["*CommRich +prot +mu"]] <- glmmTMB(as.integer(TotDensity_scale * 1000) ~ Heat*Trtmt_Day*CommRich + protegens + community_expected_mu,
data = data_subset,
family = genpois,
control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
return(output.ls)
}
# a function to plot model predictions against the data
# there's no colours. Just the facets for heat & protegens
plot_model_pred.nocolours <- function(mod_list, mod_name){
# create data.frame for plotting
absDen_predict <- cbind(mod_list[[mod_name]]$frame,
predicted=predict(mod_list[[mod_name]], type="response"))
# change the first column name for easier plotting
colnames(absDen_predict)[1] <- "observed"
# create the plot
out <- ggplot(absDen_predict,
aes(x=Trtmt_Day, y=observed)) +
facet_grid(protegens ~ Heat) +
geom_jitter(alpha=0.4) +
geom_line(aes(y=predicted, group=paste(Heat, protegens)), colour="red") +
scale_y_log10() +
scale_colour_viridis_d(option = "viridis", begin=0.1, end=0.85) +
labs(y="Absolute Density (rescaled)",
title=paste(mod_name, "model predictions"))
return(out)
rm(absDen_predict)
}
# a function to plot model predictions against the data
# INOCULATED COMMUNITY RICHNESS is plotted as different colours
plot_model_pred.CommRich <- function(mod_list, mod_name){
# create data.frame for plotting
absDen_predict <- cbind(mod_list[[mod_name]]$frame,
predicted=predict(mod_list[[mod_name]], type="response"))
# change the first column name for easier plotting
colnames(absDen_predict)[1] <- "observed"
# create the plot
out <- ggplot(absDen_predict,
aes(x=Trtmt_Day, y=observed, colour=as.factor(CommRich))) +
facet_grid(protegens ~ Heat) +
geom_jitter(alpha=0.4) +
geom_line(aes(y=predicted, group=as.factor(CommRich))) +
scale_y_log10() +
scale_colour_viridis_d(option = "viridis", begin=0.1, end=0.85) +
labs(y="Absolute Density (rescaled)",
colour="CommRich",
title=paste(mod_name, "model predictions"))
return(out)
rm(absDen_predict)
}
# a function to plot model predictions against the data
# EXPECTED COMMUNITY MU is plotted as different colours
plot_model_pred.MU <- function(mod_list, mod_name){
# create data.frame for plotting
absDen_predict <- cbind(mod_list[[mod_name]]$frame,
predicted=predict(mod_list[[mod_name]], type="response"))
# change the first column name for easier plotting
colnames(absDen_predict)[1] <- "observed"
# create the plot
out <- ggplot(absDen_predict,
aes(x=Trtmt_Day, y=observed,
colour=community_expected_mu, group=as.factor(community_expected_mu))) +
facet_grid(protegens ~ Heat) +
geom_jitter(alpha=0.4) +
geom_line(aes(y=predicted)) +
scale_y_log10() +
scale_colour_viridis_c(option = "inferno", end=0.85) +
labs(y="Absolute Density (rescaled)",
colour="Expected\nCommunity mu",
title=paste(mod_name, "model predictions"))
return(out)
rm(absDen_predict)
}
####################
# 6h heat duration
####################
# grab just the treatment with its associated control data
absDen_6h <- rbind(absDen_forFit %>% filter(Heat == "6"),
absDen_forFit %>% filter(Heat == "control", Day < 4))
# create a column for last day of heat, first day of recovery, and last day of recovery
absDen_6h$Trtmt_Day <- "resist"
absDen_6h$Trtmt_Day[absDen_6h$Day == 2] <- "recov_1"
absDen_6h$Trtmt_Day[absDen_6h$Day == 3] <- "recov_2"
# appropriately distinguish between numbers and factors
absDen_6h$Trtmt_Day <- as.factor(absDen_6h$Trtmt_Day)
absDen_6h$Heat <- droplevels(absDen_6h$Heat)
absDen_6h$resistant <- as.factor(absDen_6h$resistant)
absDen_6h$protegens <- as.factor(absDen_6h$protegens)
# fit different models:
absDen_mods6h <- fit_productivity_models(absDen_6h)
# check the simplest possible models for multicolinearity
check_collinearity(absDen_mods6h[["simple"]])
check_collinearity(absDen_mods6h[["simple resist"]])
# print a summary table of the model fits
data.frame(pars = sapply(absDen_mods6h, npar_of_glmmTMB_fit),
AIC = sapply(absDen_mods6h, AIC),
AICc = sapply(absDen_mods6h, AICc),
BIC = sapply(absDen_mods6h, BIC)) %>%
mutate(dAIC = min(AIC)-AIC,
dAICc = min(AICc)-AICc,
dBIC = min(BIC)-BIC) %>% arrange(BIC)
# plot the best model for 6h:
print(plot_model_pred.MU(mod_list=absDen_mods6h, mod_name="*mu +prot"))
# plot the best model for the complete data:
print(plot_model_pred.MU(mod_list=absDen_mods6h, mod_name="*prot*mu +CommRich"))
# plot the preferred model for the complete data:
print(plot_model_pred.MU(mod_list=absDen_mods6h, mod_name="*prot +mu + prot*CommRich"))
# check the fit and estimates of the best model for the complete data:
simulateResiduals(fittedModel = absDen_mods6h[["*prot*mu +CommRich"]], plot = TRUE)
## Object of Class DHARMa with simulated residuals based on 250 simulations with refit = FALSE . See ?DHARMa::simulateResiduals for help.
##
## Scaled residual values: 0.624 0.96 0.724 0.6761063 0.992 0.872 0.584 0.02 0.916 0.4455416 0.964 1 0.752 0.4 0.864 0.4146115 0.112 0.5655875 0.5192903 0.572 ...
summary(absDen_mods6h[["*prot*mu +CommRich"]])
## Family: genpois ( log )
## Formula:
## as.integer(TotDensity_scale * 1000) ~ Heat * Trtmt_Day * protegens *
## community_expected_mu + CommRich
## Data: data_subset
##
## AIC BIC logLik deviance df.resid
## 5633.3 5736.5 -2790.6 5581.3 366
##
##
## Dispersion parameter for genpois family (): 304
##
## Conditional model:
## Estimate
## (Intercept) 9.15663
## Heatcontrol -2.69889
## Trtmt_Dayrecov_2 -0.53159
## Trtmt_Dayresist -3.32573
## protegens1 -2.26618
## community_expected_mu -2.09080
## CommRich 0.02518
## Heatcontrol:Trtmt_Dayrecov_2 2.46173
## Heatcontrol:Trtmt_Dayresist 4.55108
## Heatcontrol:protegens1 0.09441
## Trtmt_Dayrecov_2:protegens1 -0.37024
## Trtmt_Dayresist:protegens1 3.81032
## Heatcontrol:community_expected_mu 2.94727
## Trtmt_Dayrecov_2:community_expected_mu 0.58031
## Trtmt_Dayresist:community_expected_mu 3.86389
## protegens1:community_expected_mu 1.15169
## Heatcontrol:Trtmt_Dayrecov_2:protegens1 -0.71601
## Heatcontrol:Trtmt_Dayresist:protegens1 -3.77165
## Heatcontrol:Trtmt_Dayrecov_2:community_expected_mu -3.05173
## Heatcontrol:Trtmt_Dayresist:community_expected_mu -5.13012
## Heatcontrol:protegens1:community_expected_mu -0.31375
## Trtmt_Dayrecov_2:protegens1:community_expected_mu 0.02075
## Trtmt_Dayresist:protegens1:community_expected_mu -4.33957
## Heatcontrol:Trtmt_Dayrecov_2:protegens1:community_expected_mu 1.02566
## Heatcontrol:Trtmt_Dayresist:protegens1:community_expected_mu 4.73448
## Std. Error
## (Intercept) 0.40488
## Heatcontrol 0.59547
## Trtmt_Dayrecov_2 0.56176
## Trtmt_Dayresist 0.57510
## protegens1 1.00410
## community_expected_mu 0.48536
## CommRich 0.03105
## Heatcontrol:Trtmt_Dayrecov_2 0.96092
## Heatcontrol:Trtmt_Dayresist 0.90474
## Heatcontrol:protegens1 1.51133
## Trtmt_Dayrecov_2:protegens1 1.48404
## Trtmt_Dayresist:protegens1 1.43502
## Heatcontrol:community_expected_mu 0.68689
## Trtmt_Dayrecov_2:community_expected_mu 0.66397
## Trtmt_Dayresist:community_expected_mu 0.65607
## protegens1:community_expected_mu 1.12505
## Heatcontrol:Trtmt_Dayrecov_2:protegens1 2.30808
## Heatcontrol:Trtmt_Dayresist:protegens1 2.13203
## Heatcontrol:Trtmt_Dayrecov_2:community_expected_mu 1.11975
## Heatcontrol:Trtmt_Dayresist:community_expected_mu 1.03026
## Heatcontrol:protegens1:community_expected_mu 1.67526
## Trtmt_Dayrecov_2:protegens1:community_expected_mu 1.66028
## Trtmt_Dayresist:protegens1:community_expected_mu 1.60061
## Heatcontrol:Trtmt_Dayrecov_2:protegens1:community_expected_mu 2.56853
## Heatcontrol:Trtmt_Dayresist:protegens1:community_expected_mu 2.36475
## z value Pr(>|z|)
## (Intercept) 22.616 < 2e-16
## Heatcontrol -4.532 5.83e-06
## Trtmt_Dayrecov_2 -0.946 0.34400
## Trtmt_Dayresist -5.783 7.34e-09
## protegens1 -2.257 0.02401
## community_expected_mu -4.308 1.65e-05
## CommRich 0.811 0.41732
## Heatcontrol:Trtmt_Dayrecov_2 2.562 0.01041
## Heatcontrol:Trtmt_Dayresist 5.030 4.90e-07
## Heatcontrol:protegens1 0.062 0.95019
## Trtmt_Dayrecov_2:protegens1 -0.249 0.80299
## Trtmt_Dayresist:protegens1 2.655 0.00793
## Heatcontrol:community_expected_mu 4.291 1.78e-05
## Trtmt_Dayrecov_2:community_expected_mu 0.874 0.38212
## Trtmt_Dayresist:community_expected_mu 5.889 3.88e-09
## protegens1:community_expected_mu 1.024 0.30599
## Heatcontrol:Trtmt_Dayrecov_2:protegens1 -0.310 0.75640
## Heatcontrol:Trtmt_Dayresist:protegens1 -1.769 0.07689
## Heatcontrol:Trtmt_Dayrecov_2:community_expected_mu -2.725 0.00642
## Heatcontrol:Trtmt_Dayresist:community_expected_mu -4.979 6.38e-07
## Heatcontrol:protegens1:community_expected_mu -0.187 0.85144
## Trtmt_Dayrecov_2:protegens1:community_expected_mu 0.012 0.99003
## Trtmt_Dayresist:protegens1:community_expected_mu -2.711 0.00670
## Heatcontrol:Trtmt_Dayrecov_2:protegens1:community_expected_mu 0.399 0.68966
## Heatcontrol:Trtmt_Dayresist:protegens1:community_expected_mu 2.002 0.04527
##
## (Intercept) ***
## Heatcontrol ***
## Trtmt_Dayrecov_2
## Trtmt_Dayresist ***
## protegens1 *
## community_expected_mu ***
## CommRich
## Heatcontrol:Trtmt_Dayrecov_2 *
## Heatcontrol:Trtmt_Dayresist ***
## Heatcontrol:protegens1
## Trtmt_Dayrecov_2:protegens1
## Trtmt_Dayresist:protegens1 **
## Heatcontrol:community_expected_mu ***
## Trtmt_Dayrecov_2:community_expected_mu
## Trtmt_Dayresist:community_expected_mu ***
## protegens1:community_expected_mu
## Heatcontrol:Trtmt_Dayrecov_2:protegens1
## Heatcontrol:Trtmt_Dayresist:protegens1 .
## Heatcontrol:Trtmt_Dayrecov_2:community_expected_mu **
## Heatcontrol:Trtmt_Dayresist:community_expected_mu ***
## Heatcontrol:protegens1:community_expected_mu
## Trtmt_Dayrecov_2:protegens1:community_expected_mu
## Trtmt_Dayresist:protegens1:community_expected_mu **
## Heatcontrol:Trtmt_Dayrecov_2:protegens1:community_expected_mu
## Heatcontrol:Trtmt_Dayresist:protegens1:community_expected_mu *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# check the fit and estimates of the preferred model for the complete data:
simulateResiduals(fittedModel = absDen_mods6h[["*prot*mu +CommRich"]], plot = TRUE)
## Object of Class DHARMa with simulated residuals based on 250 simulations with refit = FALSE . See ?DHARMa::simulateResiduals for help.
##
## Scaled residual values: 0.624 0.96 0.724 0.6761063 0.992 0.872 0.584 0.02 0.916 0.4455416 0.964 1 0.752 0.4 0.864 0.4146115 0.112 0.5655875 0.5192903 0.572 ...
summary(absDen_mods6h[["*prot +mu + prot*CommRich"]])
## Family: genpois ( log )
## Formula:
## as.integer(TotDensity_scale * 1000) ~ Heat * Trtmt_Day * protegens +
## community_expected_mu + protegens * CommRich
## Data: data_subset
##
## AIC BIC logLik deviance df.resid
## 5662.5 5726.0 -2815.2 5630.5 376
##
##
## Dispersion parameter for genpois family (): 368
##
## Conditional model:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 7.839618 0.228850 34.26 < 2e-16
## Heatcontrol -0.180861 0.118842 -1.52 0.1280
## Trtmt_Dayrecov_2 -0.018307 0.097866 -0.19 0.8516
## Trtmt_Dayresist -0.008151 0.101418 -0.08 0.9359
## protegens1 -1.304007 0.181747 -7.17 7.24e-13
## community_expected_mu -0.491750 0.214502 -2.29 0.0219
## CommRich -0.003971 0.044970 -0.09 0.9296
## Heatcontrol:Trtmt_Dayrecov_2 -0.137528 0.167598 -0.82 0.4119
## Heatcontrol:Trtmt_Dayresist 0.174712 0.164347 1.06 0.2878
## Heatcontrol:protegens1 -0.061370 0.180549 -0.34 0.7339
## Trtmt_Dayrecov_2:protegens1 -0.333568 0.165252 -2.02 0.0435
## Trtmt_Dayresist:protegens1 0.061134 0.164067 0.37 0.7094
## protegens1:CommRich 0.032276 0.064557 0.50 0.6171
## Heatcontrol:Trtmt_Dayrecov_2:protegens1 0.075049 0.261004 0.29 0.7737
## Heatcontrol:Trtmt_Dayresist:protegens1 0.251918 0.250033 1.01 0.3137
##
## (Intercept) ***
## Heatcontrol
## Trtmt_Dayrecov_2
## Trtmt_Dayresist
## protegens1 ***
## community_expected_mu *
## CommRich
## Heatcontrol:Trtmt_Dayrecov_2
## Heatcontrol:Trtmt_Dayresist
## Heatcontrol:protegens1
## Trtmt_Dayrecov_2:protegens1 *
## Trtmt_Dayresist:protegens1
## protegens1:CommRich
## Heatcontrol:Trtmt_Dayrecov_2:protegens1
## Heatcontrol:Trtmt_Dayresist:protegens1
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
####################
# 12h heat duration
####################
# grab just the treatment with its associated control data
absDen_12h <- rbind(absDen_forFit %>% filter(Heat == "12", Day > 1),
absDen_forFit %>% filter(Heat == "control", Day > 1, Day != 5))
# create a column for last day of heat, first day of recovery, and last day of recovery
absDen_12h$Trtmt_Day <- "resist"
absDen_12h$Trtmt_Day[absDen_12h$Day == 3] <- "recov_1"
absDen_12h$Trtmt_Day[absDen_12h$Day == 4] <- "recov_2"
# appropriately distinguish between numbers and factors
absDen_12h$Trtmt_Day <- as.factor(absDen_12h$Trtmt_Day)
absDen_12h$Heat <- droplevels(absDen_12h$Heat)
absDen_12h$resistant <- as.factor(absDen_12h$resistant)
absDen_12h$protegens <- as.factor(absDen_12h$protegens)
# fit different models:
absDen_mods12h <- fit_productivity_models(absDen_12h)
# check the simplest possible models for multicolinearity
check_collinearity(absDen_mods12h[["simple"]])
check_collinearity(absDen_mods12h[["simple resist"]])
# print a summary table of the model fits
data.frame(pars = sapply(absDen_mods12h, npar_of_glmmTMB_fit),
AIC = sapply(absDen_mods12h, AIC),
AICc = sapply(absDen_mods12h, AICc),
BIC = sapply(absDen_mods12h, BIC)) %>%
mutate(dAIC = min(AIC)-AIC,
dAICc = min(AICc)-AICc,
dBIC = min(BIC)-BIC) %>% arrange(BIC)
# plot the best model for 12h:
print(plot_model_pred.MU(mod_list=absDen_mods12h, mod_name="*mu +prot"))
# plot the best model for the complete data:
print(plot_model_pred.MU(mod_list=absDen_mods12h, mod_name="*prot*mu +CommRich"))
# plot the preferred model for the complete data:
print(plot_model_pred.MU(mod_list=absDen_mods12h, mod_name="*prot +mu + prot*CommRich"))
# check the fit and estimates of the best model for the complete data:
simulateResiduals(fittedModel = absDen_mods12h[["*prot*mu +CommRich"]], plot = TRUE)
## Object of Class DHARMa with simulated residuals based on 250 simulations with refit = FALSE . See ?DHARMa::simulateResiduals for help.
##
## Scaled residual values: 0.876 0.632 0.2287389 0.686251 0.2893171 0.468 0.08077386 0.628 0.82 0.42 0.672 0.708 0.532 0.4667235 0.568 0.584 0.4163186 0.3632073 0.5082269 0.92 ...
summary(absDen_mods12h[["*prot*mu +CommRich"]])
## Family: genpois ( log )
## Formula:
## as.integer(TotDensity_scale * 1000) ~ Heat * Trtmt_Day * protegens *
## community_expected_mu + CommRich
## Data: data_subset
##
## AIC BIC logLik deviance df.resid
## 4909.8 5010.3 -2428.9 4857.8 327
##
##
## Dispersion parameter for genpois family (): 367
##
## Conditional model:
## Estimate
## (Intercept) 8.09791
## Heatcontrol 0.25448
## Trtmt_Dayrecov_2 0.71753
## Trtmt_Dayresist -4.82578
## protegens1 -0.74091
## community_expected_mu -1.10117
## CommRich 0.05516
## Heatcontrol:Trtmt_Dayrecov_2 -1.20770
## Heatcontrol:Trtmt_Dayresist 2.87963
## Heatcontrol:protegens1 -2.54842
## Trtmt_Dayrecov_2:protegens1 -3.98416
## Trtmt_Dayresist:protegens1 3.37405
## Heatcontrol:community_expected_mu -0.50272
## Trtmt_Dayrecov_2:community_expected_mu -1.27493
## Trtmt_Dayresist:community_expected_mu 4.86917
## protegens1:community_expected_mu -0.74334
## Heatcontrol:Trtmt_Dayrecov_2:protegens1 4.54307
## Heatcontrol:Trtmt_Dayresist:protegens1 -2.22955
## Heatcontrol:Trtmt_Dayrecov_2:community_expected_mu 1.98282
## Heatcontrol:Trtmt_Dayresist:community_expected_mu -2.37210
## Heatcontrol:protegens1:community_expected_mu 2.68624
## Trtmt_Dayrecov_2:protegens1:community_expected_mu 4.67950
## Trtmt_Dayresist:protegens1:community_expected_mu -3.24580
## Heatcontrol:Trtmt_Dayrecov_2:protegens1:community_expected_mu -5.28624
## Heatcontrol:Trtmt_Dayresist:protegens1:community_expected_mu 2.11439
## Std. Error
## (Intercept) 0.66174
## Heatcontrol 0.95852
## Trtmt_Dayrecov_2 1.00213
## Trtmt_Dayresist 0.87125
## protegens1 1.26176
## community_expected_mu 0.79076
## CommRich 0.04032
## Heatcontrol:Trtmt_Dayrecov_2 1.35549
## Heatcontrol:Trtmt_Dayresist 1.20260
## Heatcontrol:protegens1 1.87852
## Trtmt_Dayrecov_2:protegens1 1.90107
## Trtmt_Dayresist:protegens1 1.71937
## Heatcontrol:community_expected_mu 1.13250
## Trtmt_Dayrecov_2:community_expected_mu 1.20788
## Trtmt_Dayresist:community_expected_mu 1.00848
## protegens1:community_expected_mu 1.43013
## Heatcontrol:Trtmt_Dayrecov_2:protegens1 2.78351
## Heatcontrol:Trtmt_Dayresist:protegens1 2.50706
## Heatcontrol:Trtmt_Dayrecov_2:community_expected_mu 1.60793
## Heatcontrol:Trtmt_Dayresist:community_expected_mu 1.38954
## Heatcontrol:protegens1:community_expected_mu 2.11453
## Trtmt_Dayrecov_2:protegens1:community_expected_mu 2.15578
## Trtmt_Dayresist:protegens1:community_expected_mu 1.92965
## Heatcontrol:Trtmt_Dayrecov_2:protegens1:community_expected_mu 3.13427
## Heatcontrol:Trtmt_Dayresist:protegens1:community_expected_mu 2.79618
## z value Pr(>|z|)
## (Intercept) 12.237 < 2e-16
## Heatcontrol 0.265 0.7906
## Trtmt_Dayrecov_2 0.716 0.4740
## Trtmt_Dayresist -5.539 3.04e-08
## protegens1 -0.587 0.5571
## community_expected_mu -1.393 0.1638
## CommRich 1.368 0.1713
## Heatcontrol:Trtmt_Dayrecov_2 -0.891 0.3729
## Heatcontrol:Trtmt_Dayresist 2.394 0.0166
## Heatcontrol:protegens1 -1.357 0.1749
## Trtmt_Dayrecov_2:protegens1 -2.096 0.0361
## Trtmt_Dayresist:protegens1 1.962 0.0497
## Heatcontrol:community_expected_mu -0.444 0.6571
## Trtmt_Dayrecov_2:community_expected_mu -1.056 0.2912
## Trtmt_Dayresist:community_expected_mu 4.828 1.38e-06
## protegens1:community_expected_mu -0.520 0.6032
## Heatcontrol:Trtmt_Dayrecov_2:protegens1 1.632 0.1027
## Heatcontrol:Trtmt_Dayresist:protegens1 -0.889 0.3738
## Heatcontrol:Trtmt_Dayrecov_2:community_expected_mu 1.233 0.2175
## Heatcontrol:Trtmt_Dayresist:community_expected_mu -1.707 0.0878
## Heatcontrol:protegens1:community_expected_mu 1.270 0.2040
## Trtmt_Dayrecov_2:protegens1:community_expected_mu 2.171 0.0300
## Trtmt_Dayresist:protegens1:community_expected_mu -1.682 0.0926
## Heatcontrol:Trtmt_Dayrecov_2:protegens1:community_expected_mu -1.687 0.0917
## Heatcontrol:Trtmt_Dayresist:protegens1:community_expected_mu 0.756 0.4495
##
## (Intercept) ***
## Heatcontrol
## Trtmt_Dayrecov_2
## Trtmt_Dayresist ***
## protegens1
## community_expected_mu
## CommRich
## Heatcontrol:Trtmt_Dayrecov_2
## Heatcontrol:Trtmt_Dayresist *
## Heatcontrol:protegens1
## Trtmt_Dayrecov_2:protegens1 *
## Trtmt_Dayresist:protegens1 *
## Heatcontrol:community_expected_mu
## Trtmt_Dayrecov_2:community_expected_mu
## Trtmt_Dayresist:community_expected_mu ***
## protegens1:community_expected_mu
## Heatcontrol:Trtmt_Dayrecov_2:protegens1
## Heatcontrol:Trtmt_Dayresist:protegens1
## Heatcontrol:Trtmt_Dayrecov_2:community_expected_mu
## Heatcontrol:Trtmt_Dayresist:community_expected_mu .
## Heatcontrol:protegens1:community_expected_mu
## Trtmt_Dayrecov_2:protegens1:community_expected_mu *
## Trtmt_Dayresist:protegens1:community_expected_mu .
## Heatcontrol:Trtmt_Dayrecov_2:protegens1:community_expected_mu .
## Heatcontrol:Trtmt_Dayresist:protegens1:community_expected_mu
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# check the fit and estimates of the preferred model for the complete data:
simulateResiduals(fittedModel = absDen_mods12h[["*prot +mu + prot*CommRich"]], plot = TRUE)
## Object of Class DHARMa with simulated residuals based on 250 simulations with refit = FALSE . See ?DHARMa::simulateResiduals for help.
##
## Scaled residual values: 0.928 0.884 0.2 0.628 0.2543694 0.488 0 0.598251 0.904 0.4106342 0.672 0.548 0.52 0.4607739 0.572 0.6 0.1 0.3307235 0.852 0.964 ...
simulateResiduals(fittedModel = absDen_mods12h[["*prot*mu +CommRich"]], plot = TRUE)
## Object of Class DHARMa with simulated residuals based on 250 simulations with refit = FALSE . See ?DHARMa::simulateResiduals for help.
##
## Scaled residual values: 0.876 0.632 0.2287389 0.686251 0.2893171 0.468 0.08077386 0.628 0.82 0.42 0.672 0.708 0.532 0.4667235 0.568 0.584 0.4163186 0.3632073 0.5082269 0.92 ...
summary(absDen_mods12h[["*prot +mu + prot*CommRich"]])
## Family: genpois ( log )
## Formula:
## as.integer(TotDensity_scale * 1000) ~ Heat * Trtmt_Day * protegens +
## community_expected_mu + protegens * CommRich
## Data: data_subset
##
## AIC BIC logLik deviance df.resid
## 4935.9 4997.8 -2452.0 4903.9 337
##
##
## Dispersion parameter for genpois family (): 446
##
## Conditional model:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 7.01699 0.32692 21.464 < 2e-16
## Heatcontrol -0.21263 0.13850 -1.535 0.1247
## Trtmt_Dayrecov_2 -0.37577 0.14055 -2.674 0.0075
## Trtmt_Dayresist -0.86993 0.16953 -5.131 2.88e-07
## protegens1 -1.25584 0.23759 -5.286 1.25e-07
## community_expected_mu 0.07319 0.29296 0.250 0.8027
## CommRich 0.11143 0.07060 1.578 0.1145
## Heatcontrol:Trtmt_Dayrecov_2 0.51493 0.20295 2.537 0.0112
## Heatcontrol:Trtmt_Dayresist 1.11031 0.21161 5.247 1.55e-07
## Heatcontrol:protegens1 -0.09314 0.20466 -0.455 0.6490
## Trtmt_Dayrecov_2:protegens1 0.18835 0.20302 0.928 0.3535
## Trtmt_Dayresist:protegens1 0.89448 0.21920 4.081 4.49e-05
## protegens1:CommRich -0.08730 0.08660 -1.008 0.3134
## Heatcontrol:Trtmt_Dayrecov_2:protegens1 -0.17894 0.29796 -0.601 0.5481
## Heatcontrol:Trtmt_Dayresist:protegens1 -0.71900 0.29616 -2.428 0.0152
##
## (Intercept) ***
## Heatcontrol
## Trtmt_Dayrecov_2 **
## Trtmt_Dayresist ***
## protegens1 ***
## community_expected_mu
## CommRich
## Heatcontrol:Trtmt_Dayrecov_2 *
## Heatcontrol:Trtmt_Dayresist ***
## Heatcontrol:protegens1
## Trtmt_Dayrecov_2:protegens1
## Trtmt_Dayresist:protegens1 ***
## protegens1:CommRich
## Heatcontrol:Trtmt_Dayrecov_2:protegens1
## Heatcontrol:Trtmt_Dayresist:protegens1 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
####################
# 24h heat duration
####################
# grab just the treatment with its associated control data
absDen_24h <- rbind(absDen_forFit %>% filter(Heat == "24", Day > 1),
absDen_forFit %>% filter(Heat == "control", Day > 1, Day != 5))
# create a column for last day of heat, first day of recovery, and last day of recovery
absDen_24h$Trtmt_Day <- "resist"
absDen_24h$Trtmt_Day[absDen_24h$Day == 3] <- "recov_1"
absDen_24h$Trtmt_Day[absDen_24h$Day == 4] <- "recov_2"
# appropriately distinguish between numbers and factors
absDen_24h$Trtmt_Day <- as.factor(absDen_24h$Trtmt_Day)
absDen_24h$Heat <- droplevels(absDen_24h$Heat)
absDen_24h$resistant <- as.factor(absDen_24h$resistant)
absDen_24h$protegens <- as.factor(absDen_24h$protegens)
# fit different models:
absDen_mods24h <- fit_productivity_models(absDen_24h)
# check the simplest possible models for multicolinearity
check_collinearity(absDen_mods24h[["simple"]])
check_collinearity(absDen_mods24h[["simple resist"]])
# print a summary table of the model fits
data.frame(pars = sapply(absDen_mods24h, npar_of_glmmTMB_fit),
AIC = sapply(absDen_mods24h, AIC),
AICc = sapply(absDen_mods24h, AICc),
BIC = sapply(absDen_mods24h, BIC)) %>%
mutate(dAIC = min(AIC)-AIC,
dAICc = min(AICc)-AICc,
dBIC = min(BIC)-BIC) %>% arrange(BIC)
# plot a top model that is unique for 24h:
# note that this is the 3rd best model for 24h:
print(plot_model_pred.CommRich(mod_list=absDen_mods24h, mod_name="*prot*CommRich +mu"))
## Warning in scale_y_log10(): log-10 transformation introduced infinite values.
# plot the best model for the complete data:
# note that this is ALSO the 1st best model for 24h:
print(plot_model_pred.MU(mod_list=absDen_mods24h, mod_name="*prot*mu +CommRich"))
## Warning in scale_y_log10(): log-10 transformation introduced infinite values.
# plot the preferred model for the complete data:
# note that this is ALSO the 2nd best model for 24h:
print(plot_model_pred.MU(mod_list=absDen_mods24h, mod_name="*prot +mu + prot*CommRich"))
## Warning in scale_y_log10(): log-10 transformation introduced infinite values.
# check the fit and estimates of the best model for the complete data:
simulateResiduals(fittedModel = absDen_mods24h[["*prot*mu +CommRich"]], plot = TRUE)
## Object of Class DHARMa with simulated residuals based on 250 simulations with refit = FALSE . See ?DHARMa::simulateResiduals for help.
##
## Scaled residual values: 0.4549449 0.446523 0.544 0.2972669 0.8025741 0.6789442 0.612 0.4374037 0.2729079 0.3030947 0.664 0.592 0.303566 0.1624146 0.14 0.1910651 0.2 0.14 0.3627356 0.3717779 ...
summary(absDen_mods24h[["*prot*mu +CommRich"]])
## Family: genpois ( log )
## Formula:
## as.integer(TotDensity_scale * 1000) ~ Heat * Trtmt_Day * protegens *
## community_expected_mu + CommRich
## Data: data_subset
##
## AIC BIC logLik deviance df.resid
## 4966.6 5067.2 -2457.3 4914.6 328
##
##
## Dispersion parameter for genpois family (): 898
##
## Conditional model:
## Estimate
## (Intercept) -1.06385
## Heatcontrol 9.28614
## Trtmt_Dayrecov_2 3.74130
## Trtmt_Dayresist 6.35728
## protegens1 7.06537
## community_expected_mu 7.83934
## CommRich 0.17510
## Heatcontrol:Trtmt_Dayrecov_2 -4.12166
## Heatcontrol:Trtmt_Dayresist -8.30158
## Heatcontrol:protegens1 -10.42663
## Trtmt_Dayrecov_2:protegens1 -4.73014
## Trtmt_Dayresist:protegens1 -4.97428
## Heatcontrol:community_expected_mu -9.33776
## Trtmt_Dayrecov_2:community_expected_mu -3.14872
## Trtmt_Dayresist:community_expected_mu -9.08213
## protegens1:community_expected_mu -7.86491
## Heatcontrol:Trtmt_Dayrecov_2:protegens1 5.25834
## Heatcontrol:Trtmt_Dayresist:protegens1 6.27755
## Heatcontrol:Trtmt_Dayrecov_2:community_expected_mu 3.74696
## Heatcontrol:Trtmt_Dayresist:community_expected_mu 11.59928
## Heatcontrol:protegens1:community_expected_mu 9.98283
## Trtmt_Dayrecov_2:protegens1:community_expected_mu 3.77422
## Trtmt_Dayresist:protegens1:community_expected_mu 7.04546
## Heatcontrol:Trtmt_Dayrecov_2:protegens1:community_expected_mu -4.35323
## Heatcontrol:Trtmt_Dayresist:protegens1:community_expected_mu -8.42221
## Std. Error
## (Intercept) 0.61438
## Heatcontrol 1.09770
## Trtmt_Dayrecov_2 0.78556
## Trtmt_Dayresist 1.21326
## protegens1 1.33830
## community_expected_mu 0.58303
## CommRich 0.04775
## Heatcontrol:Trtmt_Dayrecov_2 1.44037
## Heatcontrol:Trtmt_Dayresist 1.62354
## Heatcontrol:protegens1 2.07681
## Trtmt_Dayrecov_2:protegens1 1.91362
## Trtmt_Dayresist:protegens1 2.16799
## Heatcontrol:community_expected_mu 1.21212
## Trtmt_Dayrecov_2:community_expected_mu 0.75005
## Trtmt_Dayresist:community_expected_mu 1.29511
## protegens1:community_expected_mu 1.42804
## Heatcontrol:Trtmt_Dayrecov_2:protegens1 2.99065
## Heatcontrol:Trtmt_Dayresist:protegens1 3.00853
## Heatcontrol:Trtmt_Dayrecov_2:community_expected_mu 1.58514
## Heatcontrol:Trtmt_Dayresist:community_expected_mu 1.79943
## Heatcontrol:protegens1:community_expected_mu 2.28848
## Trtmt_Dayrecov_2:protegens1:community_expected_mu 2.06167
## Trtmt_Dayresist:protegens1:community_expected_mu 2.36909
## Heatcontrol:Trtmt_Dayrecov_2:protegens1:community_expected_mu 3.30114
## Heatcontrol:Trtmt_Dayresist:protegens1:community_expected_mu 3.31968
## z value Pr(>|z|)
## (Intercept) -1.732 0.083348
## Heatcontrol 8.460 < 2e-16
## Trtmt_Dayrecov_2 4.763 1.91e-06
## Trtmt_Dayresist 5.240 1.61e-07
## protegens1 5.279 1.30e-07
## community_expected_mu 13.446 < 2e-16
## CommRich 3.667 0.000246
## Heatcontrol:Trtmt_Dayrecov_2 -2.862 0.004216
## Heatcontrol:Trtmt_Dayresist -5.113 3.17e-07
## Heatcontrol:protegens1 -5.020 5.15e-07
## Trtmt_Dayrecov_2:protegens1 -2.472 0.013443
## Trtmt_Dayresist:protegens1 -2.294 0.021767
## Heatcontrol:community_expected_mu -7.704 1.32e-14
## Trtmt_Dayrecov_2:community_expected_mu -4.198 2.69e-05
## Trtmt_Dayresist:community_expected_mu -7.013 2.34e-12
## protegens1:community_expected_mu -5.507 3.64e-08
## Heatcontrol:Trtmt_Dayrecov_2:protegens1 1.758 0.078703
## Heatcontrol:Trtmt_Dayresist:protegens1 2.087 0.036925
## Heatcontrol:Trtmt_Dayrecov_2:community_expected_mu 2.364 0.018088
## Heatcontrol:Trtmt_Dayresist:community_expected_mu 6.446 1.15e-10
## Heatcontrol:protegens1:community_expected_mu 4.362 1.29e-05
## Trtmt_Dayrecov_2:protegens1:community_expected_mu 1.831 0.067151
## Trtmt_Dayresist:protegens1:community_expected_mu 2.974 0.002940
## Heatcontrol:Trtmt_Dayrecov_2:protegens1:community_expected_mu -1.319 0.187268
## Heatcontrol:Trtmt_Dayresist:protegens1:community_expected_mu -2.537 0.011179
##
## (Intercept) .
## Heatcontrol ***
## Trtmt_Dayrecov_2 ***
## Trtmt_Dayresist ***
## protegens1 ***
## community_expected_mu ***
## CommRich ***
## Heatcontrol:Trtmt_Dayrecov_2 **
## Heatcontrol:Trtmt_Dayresist ***
## Heatcontrol:protegens1 ***
## Trtmt_Dayrecov_2:protegens1 *
## Trtmt_Dayresist:protegens1 *
## Heatcontrol:community_expected_mu ***
## Trtmt_Dayrecov_2:community_expected_mu ***
## Trtmt_Dayresist:community_expected_mu ***
## protegens1:community_expected_mu ***
## Heatcontrol:Trtmt_Dayrecov_2:protegens1 .
## Heatcontrol:Trtmt_Dayresist:protegens1 *
## Heatcontrol:Trtmt_Dayrecov_2:community_expected_mu *
## Heatcontrol:Trtmt_Dayresist:community_expected_mu ***
## Heatcontrol:protegens1:community_expected_mu ***
## Trtmt_Dayrecov_2:protegens1:community_expected_mu .
## Trtmt_Dayresist:protegens1:community_expected_mu **
## Heatcontrol:Trtmt_Dayrecov_2:protegens1:community_expected_mu
## Heatcontrol:Trtmt_Dayresist:protegens1:community_expected_mu *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# check the fit and estimates of the preferred model for the complete data:
simulateResiduals(fittedModel = absDen_mods24h[["*prot +mu + prot*CommRich"]], plot = TRUE)
## Object of Class DHARMa with simulated residuals based on 250 simulations with refit = FALSE . See ?DHARMa::simulateResiduals for help.
##
## Scaled residual values: 0.5304174 0.3788716 0.2106584 0.8585741 0.728 0.5731777 0.5213509 0.12 0.2060813 0.3513732 0.728 0.768 0.6823773 0.1184146 0.192 0.1202988 0.552 0.316 0.068 0.2980919 ...
summary(absDen_mods24h[["*prot +mu + prot*CommRich"]])
## Family: genpois ( log )
## Formula:
## as.integer(TotDensity_scale * 1000) ~ Heat * Trtmt_Day * protegens +
## community_expected_mu + protegens * CommRich
## Data: data_subset
##
## AIC BIC logLik deviance df.resid
## 5032.2 5094.1 -2500.1 5000.2 338
##
##
## Dispersion parameter for genpois family (): 1.13e+03
##
## Conditional model:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 2.63860 0.34784 7.586 3.31e-14
## Heatcontrol 0.46860 0.25896 1.810 0.070365
## Trtmt_Dayrecov_2 1.04463 0.21234 4.920 8.67e-07
## Trtmt_Dayresist -2.14902 0.28138 -7.637 2.22e-14
## protegens1 1.13299 0.26296 4.309 1.64e-05
## community_expected_mu 2.78569 0.37170 7.494 6.66e-14
## CommRich 0.73825 0.07891 9.356 < 2e-16
## Heatcontrol:Trtmt_Dayrecov_2 -0.80677 0.28113 -2.870 0.004108
## Heatcontrol:Trtmt_Dayresist 2.69517 0.34109 7.902 2.75e-15
## Heatcontrol:protegens1 -0.99125 0.30543 -3.245 0.001173
## Trtmt_Dayrecov_2:protegens1 -1.43914 0.26652 -5.400 6.67e-08
## Trtmt_Dayresist:protegens1 1.64247 0.32041 5.126 2.96e-07
## protegens1:CommRich -0.67710 0.09426 -7.183 6.82e-13
## Heatcontrol:Trtmt_Dayrecov_2:protegens1 1.38150 0.36498 3.785 0.000154
## Heatcontrol:Trtmt_Dayresist:protegens1 -1.77160 0.40483 -4.376 1.21e-05
##
## (Intercept) ***
## Heatcontrol .
## Trtmt_Dayrecov_2 ***
## Trtmt_Dayresist ***
## protegens1 ***
## community_expected_mu ***
## CommRich ***
## Heatcontrol:Trtmt_Dayrecov_2 **
## Heatcontrol:Trtmt_Dayresist ***
## Heatcontrol:protegens1 **
## Trtmt_Dayrecov_2:protegens1 ***
## Trtmt_Dayresist:protegens1 ***
## protegens1:CommRich ***
## Heatcontrol:Trtmt_Dayrecov_2:protegens1 ***
## Heatcontrol:Trtmt_Dayresist:protegens1 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
####################
# 48h heat duration
####################
# grab just the treatment with its associated control data
absDen_48h <- rbind(absDen_forFit %>% filter(Heat == "48", Day > 2),
absDen_forFit %>% filter(Heat == "control", Day > 2))
# create a column for last day of heat, first day of recovery, and last day of recovery
absDen_48h$Trtmt_Day <- "resist"
absDen_48h$Trtmt_Day[absDen_48h$Day == 4] <- "recov_1"
absDen_48h$Trtmt_Day[absDen_48h$Day == 5] <- "recov_2"
# appropriately distinguish between numbers and factors
absDen_48h$Trtmt_Day <- as.factor(absDen_48h$Trtmt_Day)
absDen_48h$Heat <- droplevels(absDen_48h$Heat)
absDen_48h$resistant <- as.factor(absDen_48h$resistant)
absDen_48h$protegens <- as.factor(absDen_48h$protegens)
# fit different models:
absDen_mods48h <- fit_productivity_models(absDen_48h)
# check the simplest possible models for multicolinearity
check_collinearity(absDen_mods48h[["simple"]])
check_collinearity(absDen_mods48h[["simple resist"]])
# print a summary table of the model fits
data.frame(pars = sapply(absDen_mods48h, npar_of_glmmTMB_fit),
AIC = sapply(absDen_mods48h, AIC),
AICc = sapply(absDen_mods48h, AICc),
BIC = sapply(absDen_mods48h, BIC)) %>%
mutate(dAIC = min(AIC)-AIC,
dAICc = min(AICc)-AICc,
dBIC = min(BIC)-BIC) %>% arrange(BIC)
# plot the best model for 48h:
print(plot_model_pred.CommRich(mod_list=absDen_mods48h, mod_name="*prot +CommRich"))
## Warning in scale_y_log10(): log-10 transformation introduced infinite values.
# plot the best model for the complete data:
print(plot_model_pred.MU(mod_list=absDen_mods48h, mod_name="*prot*mu +CommRich"))
## Warning in scale_y_log10(): log-10 transformation introduced infinite values.
# plot the preferred model for the complete data:
print(plot_model_pred.CommRich(mod_list=absDen_mods48h, mod_name="*prot +mu + prot*CommRich"))
## Warning in scale_y_log10(): log-10 transformation introduced infinite values.
# check the fit and estimates of the best model for the complete data:
simulateResiduals(fittedModel = absDen_mods48h[["*prot*mu +CommRich"]], plot = TRUE)
## Object of Class DHARMa with simulated residuals based on 250 simulations with refit = FALSE . See ?DHARMa::simulateResiduals for help.
##
## Scaled residual values: 0.6730612 0.6791969 0.7916869 0.6795164 0.6937438 0.7163217 0.788161 0.6591218 0.6644682 0.6291745 0.2536879 0.6056463 0.5591551 0.3618208 0.2980939 0.2284036 0.3769242 0.2488986 0.2111782 0.7354215 ...
summary(absDen_mods48h[["*prot*mu +CommRich"]])
## Family: genpois ( log )
## Formula:
## as.integer(TotDensity_scale * 1000) ~ Heat * Trtmt_Day * protegens *
## community_expected_mu + CommRich
## Data: data_subset
##
## AIC BIC logLik deviance df.resid
## 3750.0 3847.9 -1849.0 3698.0 293
##
##
## Dispersion parameter for genpois family (): 555
##
## Conditional model:
## Estimate
## (Intercept) -4.291133
## Heatcontrol 12.213336
## Trtmt_Dayrecov_2 0.002528
## Trtmt_Dayresist 8.305382
## protegens1 10.632949
## community_expected_mu 8.001702
## CommRich 0.035916
## Heatcontrol:Trtmt_Dayrecov_2 0.809306
## Heatcontrol:Trtmt_Dayresist -7.854578
## Heatcontrol:protegens1 -13.173287
## Trtmt_Dayrecov_2:protegens1 -0.476146
## Trtmt_Dayresist:protegens1 -11.203126
## Heatcontrol:community_expected_mu -8.837146
## Trtmt_Dayrecov_2:community_expected_mu -0.004589
## Trtmt_Dayresist:community_expected_mu -9.397868
## protegens1:community_expected_mu -8.575331
## Heatcontrol:Trtmt_Dayrecov_2:protegens1 0.061785
## Heatcontrol:Trtmt_Dayresist:protegens1 10.691756
## Heatcontrol:Trtmt_Dayrecov_2:community_expected_mu -1.998287
## Heatcontrol:Trtmt_Dayresist:community_expected_mu 8.748105
## Heatcontrol:protegens1:community_expected_mu 9.795295
## Trtmt_Dayrecov_2:protegens1:community_expected_mu 0.293979
## Trtmt_Dayresist:protegens1:community_expected_mu 8.938939
## Heatcontrol:Trtmt_Dayrecov_2:protegens1:community_expected_mu 1.008883
## Heatcontrol:Trtmt_Dayresist:protegens1:community_expected_mu -8.388933
## Std. Error
## (Intercept) 1.696541
## Heatcontrol 1.810265
## Trtmt_Dayrecov_2 2.379266
## Trtmt_Dayresist 6.583168
## protegens1 2.047178
## community_expected_mu 1.646305
## CommRich 0.048699
## Heatcontrol:Trtmt_Dayrecov_2 2.669089
## Heatcontrol:Trtmt_Dayresist 6.656851
## Heatcontrol:protegens1 2.561747
## Trtmt_Dayrecov_2:protegens1 2.918422
## Trtmt_Dayresist:protegens1 7.311226
## Heatcontrol:community_expected_mu 1.809675
## Trtmt_Dayrecov_2:community_expected_mu 2.322114
## Trtmt_Dayresist:community_expected_mu 6.417814
## protegens1:community_expected_mu 2.084017
## Heatcontrol:Trtmt_Dayrecov_2:protegens1 3.671418
## Heatcontrol:Trtmt_Dayresist:protegens1 7.610139
## Heatcontrol:Trtmt_Dayrecov_2:community_expected_mu 2.716498
## Heatcontrol:Trtmt_Dayresist:community_expected_mu 6.519094
## Heatcontrol:protegens1:community_expected_mu 2.705234
## Trtmt_Dayrecov_2:protegens1:community_expected_mu 2.975798
## Trtmt_Dayresist:protegens1:community_expected_mu 7.311158
## Heatcontrol:Trtmt_Dayrecov_2:protegens1:community_expected_mu 3.891699
## Heatcontrol:Trtmt_Dayresist:protegens1:community_expected_mu 7.683676
## z value Pr(>|z|)
## (Intercept) -2.529 0.011428
## Heatcontrol 6.747 1.51e-11
## Trtmt_Dayrecov_2 0.001 0.999152
## Trtmt_Dayresist 1.262 0.207090
## protegens1 5.194 2.06e-07
## community_expected_mu 4.860 1.17e-06
## CommRich 0.738 0.460809
## Heatcontrol:Trtmt_Dayrecov_2 0.303 0.761727
## Heatcontrol:Trtmt_Dayresist -1.180 0.238030
## Heatcontrol:protegens1 -5.142 2.71e-07
## Trtmt_Dayrecov_2:protegens1 -0.163 0.870399
## Trtmt_Dayresist:protegens1 -1.532 0.125444
## Heatcontrol:community_expected_mu -4.883 1.04e-06
## Trtmt_Dayrecov_2:community_expected_mu -0.002 0.998423
## Trtmt_Dayresist:community_expected_mu -1.464 0.143101
## protegens1:community_expected_mu -4.115 3.88e-05
## Heatcontrol:Trtmt_Dayrecov_2:protegens1 0.017 0.986573
## Heatcontrol:Trtmt_Dayresist:protegens1 1.405 0.160040
## Heatcontrol:Trtmt_Dayrecov_2:community_expected_mu -0.736 0.461967
## Heatcontrol:Trtmt_Dayresist:community_expected_mu 1.342 0.179622
## Heatcontrol:protegens1:community_expected_mu 3.621 0.000294
## Trtmt_Dayrecov_2:protegens1:community_expected_mu 0.099 0.921305
## Trtmt_Dayresist:protegens1:community_expected_mu 1.223 0.221464
## Heatcontrol:Trtmt_Dayrecov_2:protegens1:community_expected_mu 0.259 0.795450
## Heatcontrol:Trtmt_Dayresist:protegens1:community_expected_mu -1.092 0.274927
##
## (Intercept) *
## Heatcontrol ***
## Trtmt_Dayrecov_2
## Trtmt_Dayresist
## protegens1 ***
## community_expected_mu ***
## CommRich
## Heatcontrol:Trtmt_Dayrecov_2
## Heatcontrol:Trtmt_Dayresist
## Heatcontrol:protegens1 ***
## Trtmt_Dayrecov_2:protegens1
## Trtmt_Dayresist:protegens1
## Heatcontrol:community_expected_mu ***
## Trtmt_Dayrecov_2:community_expected_mu
## Trtmt_Dayresist:community_expected_mu
## protegens1:community_expected_mu ***
## Heatcontrol:Trtmt_Dayrecov_2:protegens1
## Heatcontrol:Trtmt_Dayresist:protegens1
## Heatcontrol:Trtmt_Dayrecov_2:community_expected_mu
## Heatcontrol:Trtmt_Dayresist:community_expected_mu
## Heatcontrol:protegens1:community_expected_mu ***
## Trtmt_Dayrecov_2:protegens1:community_expected_mu
## Trtmt_Dayresist:protegens1:community_expected_mu
## Heatcontrol:Trtmt_Dayrecov_2:protegens1:community_expected_mu
## Heatcontrol:Trtmt_Dayresist:protegens1:community_expected_mu
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# check the fit and estimates of the preferred model for the complete data:
simulateResiduals(fittedModel = absDen_mods48h[["*prot +mu + prot*CommRich"]], plot = TRUE)
## Object of Class DHARMa with simulated residuals based on 250 simulations with refit = FALSE . See ?DHARMa::simulateResiduals for help.
##
## Scaled residual values: 0.6436734 0.6454696 0.7505564 0.6581071 0.6482629 0.7228183 0.8126409 0.6537034 0.6654285 0.6474962 0.2318472 0.5936928 0.4862218 0.3618208 0.291319 0.2314288 0.4074107 0.2597992 0.2229924 0.7390945 ...
summary(absDen_mods48h[["*prot +mu + prot*CommRich"]])
## Family: genpois ( log )
## Formula:
## as.integer(TotDensity_scale * 1000) ~ Heat * Trtmt_Day * protegens +
## community_expected_mu + protegens * CommRich
## Data: data_subset
##
## AIC BIC logLik deviance df.resid
## 3788.0 3848.2 -1878.0 3756.0 303
##
##
## Dispersion parameter for genpois family (): 621
##
## Conditional model:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 2.8331268 0.4532965 6.250 4.1e-10
## Heatcontrol 4.3834846 0.3165532 13.848 < 2e-16
## Trtmt_Dayrecov_2 -0.0003534 0.4081133 -0.001 0.999309
## Trtmt_Dayresist -0.1724414 0.7613290 -0.227 0.820812
## protegens1 3.1809478 0.3757864 8.465 < 2e-16
## community_expected_mu -0.1011856 0.3844470 -0.263 0.792398
## CommRich 0.0851054 0.0880786 0.966 0.333922
## Heatcontrol:Trtmt_Dayrecov_2 -0.9651338 0.4435357 -2.176 0.029555
## Heatcontrol:Trtmt_Dayresist 0.0453817 0.7768166 0.058 0.953414
## Heatcontrol:protegens1 -4.4852090 0.3581642 -12.523 < 2e-16
## Trtmt_Dayrecov_2:protegens1 -0.2098648 0.4375742 -0.480 0.631505
## Trtmt_Dayresist:protegens1 -3.1328542 0.8215695 -3.813 0.000137
## protegens1:CommRich -0.0721257 0.1048607 -0.688 0.491563
## Heatcontrol:Trtmt_Dayrecov_2:protegens1 0.9523355 0.4997122 1.906 0.056680
## Heatcontrol:Trtmt_Dayresist:protegens1 3.1195962 0.8529193 3.658 0.000255
##
## (Intercept) ***
## Heatcontrol ***
## Trtmt_Dayrecov_2
## Trtmt_Dayresist
## protegens1 ***
## community_expected_mu
## CommRich
## Heatcontrol:Trtmt_Dayrecov_2 *
## Heatcontrol:Trtmt_Dayresist
## Heatcontrol:protegens1 ***
## Trtmt_Dayrecov_2:protegens1
## Trtmt_Dayresist:protegens1 ***
## protegens1:CommRich
## Heatcontrol:Trtmt_Dayrecov_2:protegens1 .
## Heatcontrol:Trtmt_Dayresist:protegens1 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
####################
# Select the best model across all data subsets
####################
# for each information criterion, get the average across all data subsets
meanIC <- data.frame(pars = sapply(absDen_mods48h, npar_of_glmmTMB_fit),
AIC = sapply(absDen_mods6h, AIC) + sapply(absDen_mods12h, AIC) + sapply(absDen_mods24h, AIC) + sapply(absDen_mods48h, AIC),
AICc = sapply(absDen_mods6h, AICc) + sapply(absDen_mods12h, AICc) + sapply(absDen_mods24h, AICc) + sapply(absDen_mods48h, AICc),
BIC = sapply(absDen_mods6h, BIC) + sapply(absDen_mods12h, BIC) + sapply(absDen_mods24h, BIC) + sapply(absDen_mods48h, BIC)) %>%
mutate(AIC = AIC/4,
AICc = AICc/4,
BIC = BIC/4) %>%
mutate(dAIC = min(AIC)-AIC,
dAICc = min(AICc)-AICc,
dBIC = min(BIC)-BIC)
meanIC %>% arrange(BIC)
meanIC %>% arrange(AIC)
# clean up
rm(meanIC)
Both AIC and BIC agree that the best model is “*prot*mu +CommRich”. However, this model is rather complex. I’m less concerned with the danger of over-fitting as each data subset has between 360 - 393 observations (so we are still within the rule-of-thumb of 10 to 15 observations per parameter). But I am very concerned about the number of interaction in this complex model. Recall that its full formula is ~ Heat*Trtmt_Day*protegens*community_expected_mu + CommRich. So it has a 4-way interaction!!! YIKES!
I really don’t want to use a model with this many interactions because this usually leads to poor estimates (which cannot be fixed with a posthoc analysis). Or, at the very least it leads to parameter estimates that are very hard to interpret (see above).
I feel confident that the complex model “*prot*mu +CommRich” is quite close to the “Truth” because the short heat pulse data prefer the interaction Heat*Trtmt_Day*community_expected_mu while the long heat pulse data prefer the interaction Heat*Trtmt_Day*protegens. So I think this complex model is the only one that’s able to accommodate both of these effects across all heat pulse durations. The problem is that it’s too complex to be interpretable.
Therefore, I will use the BIC as the criteria for selecting a model that still captures the main effects but is much simpler. The model “*prot +mu + prot*CommRich” has a negligible \(\Delta\)BIC (recall that the rule of thumb for BIC model selection criteria is that \(\Delta\)BIC \(<2\) is not worth mentioning). (The \(\Delta\)AIC is substantially different between this simpler model and the best one but in general AIC is favouring all the most complex models anyway (which is expected as AIC penalizes less for model complexity). Here we really are much more interested in BIC because we need to simplify our model.)
I think I should modify the structure of the analysis as follows:
Check whether the most complex model is working okay here,
emmeans(absDen_mods48h[["*prot*mu +CommRich"]],
~ Heat | CommRich + Trtmt_Day*protegens*community_expected_mu,
data = absDen_48h, type = "response")
## CommRich = 2.15, Trtmt_Day = recov_1, protegens = 0, community_expected_mu = 0.893:
## Heat response SE df asymp.LCL asymp.UCL
## 48 18.8 6.63 Inf 9.40 37.5
## control 1412.9 166.00 Inf 1122.45 1778.5
##
## CommRich = 2.15, Trtmt_Day = recov_2, protegens = 0, community_expected_mu = 0.893:
## Heat response SE df asymp.LCL asymp.UCL
## 48 18.7 6.62 Inf 9.38 37.5
## control 531.9 79.80 Inf 396.32 713.9
##
## CommRich = 2.15, Trtmt_Day = resist, protegens = 0, community_expected_mu = 0.893:
## Heat response SE df asymp.LCL asymp.UCL
## 48 17.2 18.10 Inf 2.18 135.8
## control 1241.3 147.00 Inf 984.02 1565.8
##
## CommRich = 2.15, Trtmt_Day = recov_1, protegens = 1, community_expected_mu = 0.893:
## Heat response SE df asymp.LCL asymp.UCL
## 48 367.5 47.10 Inf 285.84 472.6
## control 331.2 45.10 Inf 253.62 432.4
##
## CommRich = 2.15, Trtmt_Day = recov_2, protegens = 1, community_expected_mu = 0.893:
## Heat response SE df asymp.LCL asymp.UCL
## 48 296.4 39.20 Inf 228.68 384.1
## control 263.7 35.90 Inf 201.95 344.4
##
## CommRich = 2.15, Trtmt_Day = resist, protegens = 1, community_expected_mu = 0.893:
## Heat response SE df asymp.LCL asymp.UCL
## 48 13.5 4.04 Inf 7.47 24.2
## control 285.1 38.00 Inf 219.59 370.2
##
## Unknown transformation "as.integer": no transformation done
## Confidence level used: 0.95
Let’s check whether the effect sizes of the complex model are consistent with those of the simpler model.
# plot the effect size contingent on protegens
effect_6h_protegens <- eff_size(emmeans(absDen_mods6h[["*prot*mu +CommRich"]], ~ Heat | CommRich + Trtmt_Day*protegens*community_expected_mu, data = absDen_6h),
sigma(absDen_mods6h[["*prot*mu +CommRich"]]),
edf = df.residual(absDen_mods6h[["*prot*mu +CommRich"]]))
effect_12h_protegens <- eff_size(emmeans(absDen_mods12h[["*prot*mu +CommRich"]], ~ Heat | CommRich + Trtmt_Day*protegens*community_expected_mu, data = absDen_12h),
sigma(absDen_mods12h[["*prot*mu +CommRich"]]),
edf = df.residual(absDen_mods12h[["*prot*mu +CommRich"]]))
effect_24h_protegens <- eff_size(emmeans(absDen_mods24h[["*prot*mu +CommRich"]], ~ Heat | CommRich + Trtmt_Day*protegens*community_expected_mu, data = absDen_24h),
sigma(absDen_mods24h[["*prot*mu +CommRich"]]),
edf = df.residual(absDen_mods24h[["*prot*mu +CommRich"]]))
effect_48h_protegens <- eff_size(emmeans(absDen_mods48h[["*prot*mu +CommRich"]], ~ Heat | CommRich + Trtmt_Day*protegens*community_expected_mu, data = absDen_48h),
sigma(absDen_mods48h[["*prot*mu +CommRich"]]),
edf = df.residual(absDen_mods48h[["*prot*mu +CommRich"]]))
# a function that extracts the confidence intervals from eff_size contingent on protegens
get_effsize_CIs <- function(eff_size_object, heat_trtmt) {
data.frame(Heat = heat_trtmt,
CommRich = confint(eff_size_object)[[2]],
Trtmt_Day = confint(eff_size_object)[[3]],
protegens = confint(eff_size_object)[[4]],
community_expected_mu = confint(eff_size_object)[[5]],
effect_est = confint(eff_size_object)[[6]], #[[5]],
effect_loCI = confint(eff_size_object)[[9]], #[[8]],
effect_hiCI = confint(eff_size_object)[[10]]) #[[9]])
}
# create a data.frame for plotting marginal effect sizes using a forest plot
productivity_protegens <- data.frame()
productivity_protegens <- rbind(productivity_protegens,
get_effsize_CIs(effect_6h_protegens, heat_trtmt = 6),
get_effsize_CIs(effect_12h_protegens, heat_trtmt = 12),
get_effsize_CIs(effect_24h_protegens, heat_trtmt = 24),
get_effsize_CIs(effect_48h_protegens, heat_trtmt = 48))
# re-order the levels of Trtmt_Day to go from resistance to recovery then rename them for nice plotting
productivity_protegens$Trtmt_Day <- factor(productivity_protegens$Trtmt_Day,
levels = c("resist", "recov_1", "recov_2"))
levels(productivity_protegens$Trtmt_Day) <- c("Resistance", "Early Recovery", "Late Recovery")
#plot
ggplot(productivity_protegens,
aes(x = effect_est, y = as.factor(Heat), colour = Trtmt_Day, shape = protegens)) +
geom_vline(xintercept = 0, colour="darkgrey") +
geom_point(position = position_dodge(width = 0.5)) +
geom_errorbarh(position = position_dodge(width = 0.5),
aes(xmin = effect_loCI, xmax = effect_hiCI), height = 0.1) +
scale_colour_manual(values=trtmt_pal) +
labs(x = "Effect Size on Total Density",
y = "Heat duration (hrs)",
shape = "protegens\npresent?",
title = "*prot*mu +CommRich")
# But we are not interested in the details of protegens. Let's do the post-hoc by averaging across the effects of protegens.
posthoc_6h <- emmeans(effect_6h_protegens,
pairwise ~ Trtmt_Day,
data = absDen_6h)
## NOTE: Results may be misleading due to involvement in interactions
posthoc_12h <- emmeans(effect_12h_protegens,
pairwise ~ Trtmt_Day,
data = absDen_12h)
## NOTE: Results may be misleading due to involvement in interactions
posthoc_24h <- emmeans(effect_24h_protegens,
pairwise ~ Trtmt_Day,
data = absDen_24h)
## NOTE: Results may be misleading due to involvement in interactions
posthoc_48h <- emmeans(effect_48h_protegens,
pairwise ~ Trtmt_Day,
data = absDen_48h)
## NOTE: Results may be misleading due to involvement in interactions
# a function that extracts the confidence intervals from a post-hoc object *WITHOUT* protegens
get_posthoc_NOprot <- function(posthoc_object, heat_trtmt) {
output <- multcomp::cld(posthoc_object, alpha=0.05/4, Letters = letters) %>%
data.frame() %>%
select(-df)
colnames(output)[2:6] <- c("est", "SE", "loCI", "hiCI", "groups")
output$Heat <- heat_trtmt
return(output)
}
# create a data.frame for plotting marginal effect sizes using a forest plot with the group labels
productivity_effects <- data.frame()
productivity_effects <- rbind(productivity_effects,
get_posthoc_NOprot(posthoc_6h, heat_trtmt = 6),
get_posthoc_NOprot(posthoc_12h, heat_trtmt = 12),
get_posthoc_NOprot(posthoc_24h, heat_trtmt = 24),
get_posthoc_NOprot(posthoc_48h, heat_trtmt = 48))
# re-order the levels of Trtmt_Day to go from resistance to recovery then rename them for nice plotting
productivity_effects$Trtmt_Day <- factor(productivity_effects$Trtmt_Day,
levels = c("resist", "recov_1", "recov_2"))
levels(productivity_effects$Trtmt_Day) <- c("Resistance", "Early Recovery", "Late Recovery")
# plot
ggplot(productivity_effects,
aes(x = est, y = as.factor(Heat), colour = Trtmt_Day)) +
geom_vline(xintercept = 0, colour="darkgrey") +
geom_point(position = position_dodge(width = 0.5)) +
geom_errorbarh(position = position_dodge(width = 0.5),
aes(xmin = loCI, xmax = hiCI), height = 0.1) +
geom_text(position = position_dodge(width = 0.5),
aes(x=-0.009, label=groups)) +
#scale_x_continuous(breaks=c(-0.006, -0.003, 0), limits=c(-0.01, 0.003)) +
scale_colour_manual(values=trtmt_pal) +
labs(x = "Effect Size on Total Density",
y = "Heat duration (hrs)",
title = "Averaged across protegens (with extinct reps)")
Okay so it seems that adding the community growth rate as a predictor kinda changes everything. We still see decoupling between resistance and recovery but now the time frame of it varies depending on community composition. For the communities without protegens, decoupling happens sooner (e.g., between resistance and early recovery at 12h), is maximized sooner (at 24h), and then disappears completely by 48h (i.e., because of extinction). On the other hand, we already saw in the extinction analysis above that communities with protegens are protected from extinction for all heat pulse durations investigated here. For these communities, decoupling only begins to happen much later (at 48h). Although these results are more complex than what I had previously reported, they make more sense.
Recall that we are NOT interested in reporting the results from the most complex model. Are the effect sizes of the preferred, less complex model (“*prot +mu + prot*CommRich”) consistent with those above?
# plot the effect size contingent on protegens
effect_6h_protegens <- eff_size(emmeans(absDen_mods6h[["*prot +mu + prot*CommRich"]], ~ Heat | Trtmt_Day*protegens + community_expected_mu + protegens*CommRich, data = absDen_6h),
sigma(absDen_mods6h[["*prot +mu + prot*CommRich"]]),
edf = df.residual(absDen_mods6h[["*prot +mu + prot*CommRich"]]))
effect_12h_protegens <- eff_size(emmeans(absDen_mods12h[["*prot +mu + prot*CommRich"]], ~ Heat | Trtmt_Day*protegens + community_expected_mu + protegens*CommRich, data = absDen_12h),
sigma(absDen_mods12h[["*prot +mu + prot*CommRich"]]),
edf = df.residual(absDen_mods12h[["*prot +mu + prot*CommRich"]]))
effect_24h_protegens <- eff_size(emmeans(absDen_mods24h[["*prot +mu + prot*CommRich"]], ~ Heat | Trtmt_Day*protegens + community_expected_mu + protegens*CommRich, data = absDen_24h),
sigma(absDen_mods24h[["*prot +mu + prot*CommRich"]]),
edf = df.residual(absDen_mods24h[["*prot +mu + prot*CommRich"]]))
effect_48h_protegens <- eff_size(emmeans(absDen_mods48h[["*prot +mu + prot*CommRich"]], ~ Heat | Trtmt_Day*protegens + community_expected_mu + protegens*CommRich, data = absDen_48h),
sigma(absDen_mods48h[["*prot +mu + prot*CommRich"]]),
edf = df.residual(absDen_mods48h[["*prot +mu + prot*CommRich"]]))
# a function that extracts the confidence intervals from eff_size contingent on protegens
get_effsize_CIs <- function(eff_size_object, heat_trtmt) {
data.frame(Heat = heat_trtmt,
CommRich = confint(eff_size_object)[[5]],
Trtmt_Day = confint(eff_size_object)[[2]],
protegens = confint(eff_size_object)[[3]],
community_expected_mu = confint(eff_size_object)[[4]],
effect_est = confint(eff_size_object)[[6]], #[[5]],
effect_loCI = confint(eff_size_object)[[9]], #[[8]],
effect_hiCI = confint(eff_size_object)[[10]]) #[[9]])
}
# create a data.frame for plotting marginal effect sizes using a forest plot
productivity_protegens <- data.frame()
productivity_protegens <- rbind(productivity_protegens,
get_effsize_CIs(effect_6h_protegens, heat_trtmt = 6),
get_effsize_CIs(effect_12h_protegens, heat_trtmt = 12),
get_effsize_CIs(effect_24h_protegens, heat_trtmt = 24),
get_effsize_CIs(effect_48h_protegens, heat_trtmt = 48))
# re-order the levels of Trtmt_Day to go from resistance to recovery then rename them for nice plotting
productivity_protegens$Trtmt_Day <- factor(productivity_protegens$Trtmt_Day,
levels = c("resist", "recov_1", "recov_2"))
levels(productivity_protegens$Trtmt_Day) <- c("Resistance", "Early Recovery", "Late Recovery")
#plot
ggplot(productivity_protegens,
aes(x = effect_est, y = as.factor(Heat), colour = Trtmt_Day, shape = protegens)) +
geom_vline(xintercept = 0, colour="darkgrey") +
geom_point(position = position_dodge(width = 0.5)) +
geom_errorbarh(position = position_dodge(width = 0.5),
aes(xmin = effect_loCI, xmax = effect_hiCI), height = 0.1) +
scale_colour_manual(values=trtmt_pal) +
labs(x = "Effect Size on Total Density",
y = "Heat duration (hrs)",
shape = "protegens\npresent?",
title = "*prot +mu + prot*CommRich")
# we can do a posthoc on this to illustrate statistically significant effects
posthocPROT_6h <- emmeans(effect_6h_protegens, pairwise ~ Trtmt_Day*protegens, data = absDen_6h)
## NOTE: Results may be misleading due to involvement in interactions
posthocPROT_12h <- emmeans(effect_12h_protegens, pairwise ~ Trtmt_Day*protegens, data = absDen_12h)
## NOTE: Results may be misleading due to involvement in interactions
posthocPROT_24h <- emmeans(effect_24h_protegens, pairwise ~ Trtmt_Day*protegens, data = absDen_24h)
## NOTE: Results may be misleading due to involvement in interactions
posthocPROT_48h <- emmeans(effect_48h_protegens, pairwise ~ Trtmt_Day*protegens, data = absDen_48h)
## NOTE: Results may be misleading due to involvement in interactions
# a function that extracts the confidence intervals from a post-hoc object *WITH* protegens
get_posthoc_YESprot <- function(posthoc_object, heat_trtmt) {
output <- multcomp::cld(posthoc_object, alpha=0.05/4, Letters = letters) %>%
data.frame() %>%
select(-df)
colnames(output)[3:7] <- c("est", "SE", "loCI", "hiCI", "groups")
output$Heat <- heat_trtmt
return(output)
}
# create a data.frame for plotting
prod_effects_protegens <- data.frame()
prod_effects_protegens <- rbind(prod_effects_protegens,
get_posthoc_YESprot(posthocPROT_6h, heat_trtmt = 6),
get_posthoc_YESprot(posthocPROT_12h, heat_trtmt = 12),
get_posthoc_YESprot(posthocPROT_24h, heat_trtmt = 24),
get_posthoc_YESprot(posthocPROT_48h, heat_trtmt = 48))
## note that for the decoupling plots I am using a Bonferroni-corrected alpha.
# let's get those confidence intervals because we will need them to plot decoupling below:
posthocPROT_6h_WIDER <- emmeans(effect_6h_protegens, pairwise ~ Trtmt_Day*protegens, data = absDen_6h, level=0.9875)
## NOTE: Results may be misleading due to involvement in interactions
posthocPROT_12h_WIDER <- emmeans(effect_12h_protegens, pairwise ~ Trtmt_Day*protegens, data = posthocPROT_12h, level=0.9875)
## NOTE: Results may be misleading due to involvement in interactions
posthocPROT_24h_WIDER <- emmeans(effect_24h_protegens, pairwise ~ Trtmt_Day*protegens, data = absDen_24h, level=0.9875)
## NOTE: Results may be misleading due to involvement in interactions
posthocPROT_48h_WIDER <- emmeans(effect_48h_protegens, pairwise ~ Trtmt_Day*protegens, data = absDen_48h, level=0.9875)
## NOTE: Results may be misleading due to involvement in interactions
# put these wider CIs into a table
widerCIs <- data.frame()
widerCIs <- rbind(widerCIs,
get_posthoc_YESprot(posthocPROT_6h_WIDER, heat_trtmt = 6),
get_posthoc_YESprot(posthocPROT_12h_WIDER, heat_trtmt = 12),
get_posthoc_YESprot(posthocPROT_24h_WIDER, heat_trtmt = 24),
get_posthoc_YESprot(posthocPROT_48h_WIDER, heat_trtmt = 48))
# rename the columns to remind us that this is the Bonferroni corrected alpha
colnames(widerCIs)[5:6] <- c("loCI_bonAlpha", "hiCI_bonAlpha")
# combine the wider CIs with the effect sizes
prod_effects_protegens <- inner_join(prod_effects_protegens,
widerCIs %>% select(-SE))
## Joining with `by = join_by(Trtmt_Day, protegens, est, groups, Heat)`
rm(widerCIs)
# re-order the levels of Trtmt_Day to go from resistance to recovery then rename them for nice plotting
prod_effects_protegens$Trtmt_Day <- factor(prod_effects_protegens$Trtmt_Day,
levels = c("resist", "recov_1", "recov_2"))
levels(prod_effects_protegens$Trtmt_Day) <- c("Resistance", "Early Recovery", "Late Recovery")
# plot with group labels
ggplot(prod_effects_protegens,
aes(x = est, y = as.factor(Heat), colour = Trtmt_Day, shape=protegens)) +
facet_grid( ~ protegens) +
geom_vline(xintercept = 0, colour="darkgrey") +
geom_point(position = position_dodge(width = 0.5)) +
geom_errorbarh(position = position_dodge(width = 0.5),
aes(xmin = loCI, xmax = hiCI), height = 0.1) +
geom_text(position = position_dodge(width = 0.5),
aes(x=-0.0035, label=groups)) +
scale_colour_manual(values=trtmt_pal) +
labs(x = "Effect Size on Total Density",
y="Heat duration",
shape = "protegens\npresent?",
title = "*prot +mu + prot*CommRich")
###########################
# first define some functions
###########################
# a function that takes the multcomp::cld letters from 2 groups and returns TRUE when no letters are shared (or FALSE when any letter is shared)
are_groups_different <- function(group1, group2) {
# convert the groups columns into TRUE/FALSE columns indicating significant difference between resistance and recovery effect sizes
first_group <- group1 %>%
# remove any white space
str_trim() %>%
# split the string up into single characters
strsplit(x=., split = character(0))
second_group <- group2 %>%
# remove any white space
str_trim() %>%
# split the string up into single letters
strsplit(x=., split = character(0))
# test if any letters are common. If there are, then they are NOT different so return FALSE (and vice versa).
return( !any(first_group[[1]] %in% second_group[[1]]) )
}
# a function to calculate distance from the point (x, y) to the line y = x: positive values are ABOVE the line and negative values are BELOW the line.
# this is used to calculate decoupling
dist_to_xyline <- function(x, y) {
(y - x) / sqrt(2) # distance formula derived from y = x line
}
# a function to estimate mean decoupling and its confidence intervals given mean and SYMMETRIC confidence intervals for resistance and recovery.
# Note that I can use the univariate confidence intervals only by assuming there's no correlation between resistance and recovery (which is exactly the opposite of the whole point of coupling)
# ...also, beware the the CI's come from a posthoc so they are more conservative that the real CI's...
estimate_decoupling <- function(resist_est, resist_hiCI,
recov_est, recov_hiCI) {
# check the input values
if(resist_hiCI < resist_est)
stop("`resist_hiCI` must be the *UPPER* confidence interval on resistance.")
if(recov_hiCI < recov_est)
stop("`recov_hiCI` must be the *UPPER* confidence interval on recovery.")
# get the co-ordinates that define the ellipse
x0 <- resist_est # x-coordinate of the center of the ellipse
y0 <- recov_est # y-coordinate of the center of the ellipse
a <- resist_hiCI - resist_est # semi-major axis: horizontal radius
b <- recov_hiCI - recov_est # semi-major axis: vertical radius
# generate points on the perimeter of the ellipse
theta <- seq(0, 2 * pi, length.out = 360) # angles
x_ellipse <- x0 + a * cos(theta) # x-coordinates on the ellipse
y_ellipse <- y0 + b * sin(theta) # y-coordinates on the ellipse
# decoupling measures the distance between the point and the y=x line
mean <- dist_to_xyline(x0, y0)
# do the same for all points on the ellipse defining the CI
distances <- dist_to_xyline(x_ellipse, y_ellipse)
# maximum and minimum distances define the hiCI and loCI, respectively
hiCI <- max(distances)
loCI <- min(distances)
return(c(est_decoupling = mean, loCI_decoupling = loCI, hiCI_decoupling = hiCI))
}
# positive values are ABOVE the y=x line and negative values are BELOW the y=x line
#############################
# Decoupling
#############################
decoupling_productivity <- prod_effects_protegens %>% select(-loCI, -hiCI)
# keep just the Bonferroni-corrected (wider) confidence intervals
decoupling_productivity <- decoupling_productivity %>%
rename(loCI = loCI_bonAlpha,
hiCI = hiCI_bonAlpha)
# for easier coding, rename the levels of Trtmt_Day
levels(decoupling_productivity$Trtmt_Day) <- c("resist", "early_recov", "late_recov")
# create data.frame for plotting
decoupling_productivity <- decoupling_productivity %>%
pivot_wider(names_from = Trtmt_Day,
values_from = c(est, loCI, hiCI, SE, groups))
# columns that indicate if resistance is significantly different from recovery
decoupling_productivity$early_recov_VS_resist <- mapply(are_groups_different,
decoupling_productivity$groups_early_recov,
decoupling_productivity$groups_resist)
decoupling_productivity$late_recov_VS_resist <- mapply(are_groups_different,
decoupling_productivity$groups_late_recov,
decoupling_productivity$groups_resist)
# clean up extra columns
decoupling_productivity <- decoupling_productivity %>% select(-groups_resist, -groups_early_recov, -groups_late_recov)
# first plot the decoupling on early recovery
ggplot(decoupling_productivity,
aes(x = est_resist, y = est_early_recov, colour = as.factor(Heat))) +
facet_grid(~protegens) +
geom_hline(yintercept = 0, colour="grey") +
geom_vline(xintercept = 0, colour="grey") +
geom_abline(slope = 1) +
geom_point(shape=21, size=3, aes(fill=as.factor(early_recov_VS_resist))) +
geom_errorbarh(aes(xmin = loCI_resist, xmax = hiCI_resist), height=0) +
geom_errorbar(aes(ymin = loCI_early_recov, ymax = hiCI_early_recov), width=0) +
# center the plot on 0,0:
scale_x_continuous(limits = c(-0.008, 0.008), expand = c(0, 0)) +
scale_y_continuous(limits = c(-0.0045, 0.0045), expand = c(0, 0)) +
scale_colour_viridis_d(option = "plasma", begin=0.2, end = 0.9) +
scale_fill_manual(values=c("white", "black")) +
labs(title = "Decoupling of productivity (with extinct reps)",
x = "Resistance +/- 95% CI",
y = "Early Recovery +/- 95% CI",
colour = "Heat\nDuration",
fill="Resistance\nvs. Recovery\nSignificantly\nDifferent?")
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_point()`).
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_errorbarh()`).
# here's another way to plot it where the confidence intervals are shown as ellipses:
ggplot(decoupling_productivity,
aes(x = est_resist, y = est_early_recov, colour = as.factor(Heat))) +
facet_grid(~protegens) +
geom_hline(yintercept = 0, colour="grey") +
geom_vline(xintercept = 0, colour="grey") +
geom_abline(slope = 1) +
geom_point(shape=21, size=3, aes(fill=as.factor(early_recov_VS_resist))) +
scale_colour_viridis_d(option = "plasma", begin=0.2, end = 0.9) +
scale_fill_manual(values=c("white", "black")) +
geom_ellipse(aes(x0 = est_resist,
y0 = est_early_recov,
# radius on x direction:
a = hiCI_resist - est_resist,
# radius on y direction:
b = hiCI_early_recov - est_early_recov,
angle = 0)) +
labs(title = "Decoupling of productivity (with extinct reps)",
x = "Resistance +/- 95% CI",
y = "Early Recovery +/- 95% CI",
colour = "Heat\nDuration",
fill="Resistance\nvs. Recovery\nSignificantly\nDifferent?")
# next plot the decoupling on later recovery
ggplot(decoupling_productivity,
aes(x = est_resist, y = est_late_recov, colour = as.factor(Heat))) +
facet_grid(~protegens) +
geom_hline(yintercept = 0, colour="grey") +
geom_vline(xintercept = 0, colour="grey") +
geom_abline(slope = 1) +
geom_point(shape=21, size=3, aes(fill=as.factor(late_recov_VS_resist))) +
geom_errorbarh(aes(xmin = loCI_resist, xmax = hiCI_resist), height=0) +
geom_errorbar(aes(ymin = loCI_late_recov, ymax = hiCI_late_recov), width=0) +
# center the plot on 0,0:
scale_x_continuous(limits = c(-0.008, 0.008), expand = c(0, 0)) +
scale_y_continuous(limits = c(-0.0036, 0.0036), expand = c(0, 0)) +
scale_colour_viridis_d(option = "plasma", begin=0.2, end = 0.9) +
scale_fill_manual(values=c("white", "black")) +
labs(title = "Decoupling of productivity (with extinct reps)",
x = "Resistance +/- 95% CI",
y = "Late Recovery +/- 95% CI",
colour = "Heat\nDuration",
fill="Resistance\nvs. Recovery\nSignificantly\nDifferent?")
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_point()`).
## Removed 1 row containing missing values or values outside the scale range
## (`geom_errorbarh()`).
# late recovery with CI plotted as ellipses:
ggplot(decoupling_productivity,
aes(x = est_resist, y = est_late_recov, colour = as.factor(Heat))) +
facet_grid(~protegens) +
geom_hline(yintercept = 0, colour="grey") +
geom_vline(xintercept = 0, colour="grey") +
geom_abline(slope = 1) +
geom_point(shape=21, size=3, aes(fill=as.factor(late_recov_VS_resist))) +
scale_colour_viridis_d(option = "plasma", begin=0.2, end = 0.9) +
scale_fill_manual(values=c("white", "black")) +
geom_ellipse(aes(x0 = est_resist,
y0 = est_late_recov,
# radius on x direction:
a = hiCI_resist - est_resist,
# radius on y direction:
b = hiCI_late_recov - est_late_recov,
angle = 0)) +
labs(title = "Decoupling of productivity (with extinct reps)",
x = "Resistance +/- 95% CI",
y = "Late Recovery +/- 95% CI",
colour = "Heat\nDuration",
fill="Resistance\nvs. Recovery\nSignificantly\nDifferent?")
# finally estimate decoupling by getting the distance to the y=x line
# calculate decoupling between resistance and early recovery
early_decoupling <- t(with(decoupling_productivity,
mapply(estimate_decoupling,
resist_est = est_resist,
resist_hiCI = hiCI_resist,
recov_est = est_early_recov,
recov_hiCI = hiCI_early_recov)))
# add annotation
early_decoupling <- cbind(decoupling_productivity[,1:2],
early_decoupling)
ggplot(early_decoupling,
aes(x = as.factor(Heat), y = est_decoupling)) +
facet_grid(~protegens) +
geom_hline(yintercept = 0, colour = "grey") +
geom_point(position = position_dodge(width = 0.5)) +
geom_errorbar(position = position_dodge(width = 0.5),
aes(ymin = loCI_decoupling, ymax = hiCI_decoupling),
alpha=0.4, width=0.1) +
labs(title = "Early recovery (WITH extinct reps)",
y = "Decoupling +/- 95% CI",
x = "Heat Duration (hrs)")
# calculate decoupling between resistance and late recovery
late_decoupling <- t(with(decoupling_productivity,
mapply(estimate_decoupling,
resist_est = est_resist,
resist_hiCI = hiCI_resist,
recov_est = est_late_recov,
recov_hiCI = hiCI_late_recov)))
# add annotation
late_decoupling <- cbind(decoupling_productivity[,1:2],
late_decoupling)
ggplot(late_decoupling,
aes(x = as.factor(Heat), y = est_decoupling)) +
facet_grid(~protegens) +
geom_hline(yintercept = 0, colour = "grey") +
geom_point(position = position_dodge(width = 0.5)) +
geom_errorbar(position = position_dodge(width = 0.5),
aes(ymin = loCI_decoupling, ymax = hiCI_decoupling),
alpha=0.4, width=0.1) +
scale_colour_viridis_d(option = "viridis", end=0.85) +
labs(title = "Late recovery (WITH extinct reps)",
y = "Decoupling +/- 95% CI",
x = "Heat Duration (hrs)")
##############################
# effect sizes with protegens as non-focal
##############################
# But we are not interested in the details of protegens. Let's do the post-hoc again now averaging across the effects of protegens.
posthoc_6h <- emmeans(effect_6h_protegens, pairwise ~ Trtmt_Day, data = absDen_6h)
## NOTE: Results may be misleading due to involvement in interactions
posthoc_12h <- emmeans(effect_12h_protegens, pairwise ~ Trtmt_Day, data = absDen_12h)
## NOTE: Results may be misleading due to involvement in interactions
posthoc_24h <- emmeans(effect_24h_protegens, pairwise ~ Trtmt_Day, data = absDen_24h)
## NOTE: Results may be misleading due to involvement in interactions
posthoc_48h <- emmeans(effect_48h_protegens, pairwise ~ Trtmt_Day, data = absDen_48h)
## NOTE: Results may be misleading due to involvement in interactions
# create a data.frame for plotting marginal effect sizes using a forest plot with the group labels
productivity_effects <- data.frame()
productivity_effects <- rbind(productivity_effects,
get_posthoc_NOprot(posthoc_6h, heat_trtmt = 6),
get_posthoc_NOprot(posthoc_12h, heat_trtmt = 12),
get_posthoc_NOprot(posthoc_24h, heat_trtmt = 24),
get_posthoc_NOprot(posthoc_48h, heat_trtmt = 48))
## note that for the decoupling plots I am using a Bonferroni-corrected alpha.
# let's get those confidence intervals because we will need them to plot decoupling below:
posthoc_6h_WIDER <- emmeans(effect_6h_protegens, pairwise ~ Trtmt_Day, data = absDen_6h, level=0.9875)
## NOTE: Results may be misleading due to involvement in interactions
posthoc_12h_WIDER <- emmeans(effect_12h_protegens, pairwise ~ Trtmt_Day, data = absDen_12h, level=0.9875)
## NOTE: Results may be misleading due to involvement in interactions
posthoc_24h_WIDER <- emmeans(effect_24h_protegens, pairwise ~ Trtmt_Day, data = absDen_24h, level=0.9875)
## NOTE: Results may be misleading due to involvement in interactions
posthoc_48h_WIDER <- emmeans(effect_48h_protegens, pairwise ~ Trtmt_Day, data = absDen_48h, level=0.9875)
## NOTE: Results may be misleading due to involvement in interactions
# put these wider CIs into a table
widerCIs <- data.frame()
widerCIs <- rbind(widerCIs,
get_posthoc_NOprot(posthoc_6h_WIDER, heat_trtmt = 6),
get_posthoc_NOprot(posthoc_12h_WIDER, heat_trtmt = 12),
get_posthoc_NOprot(posthoc_24h_WIDER, heat_trtmt = 24),
get_posthoc_NOprot(posthoc_48h_WIDER, heat_trtmt = 48))
# rename the columns to remind us that this is the Bonferroni corrected alpha
colnames(widerCIs)[4:5] <- c("loCI_bonAlpha", "hiCI_bonAlpha")
# combine the wider CIs with the effect sizes
productivity_effects <- inner_join(productivity_effects,
widerCIs %>% select(-SE))
## Joining with `by = join_by(Trtmt_Day, est, groups, Heat)`
rm(widerCIs)
# re-order the levels of Trtmt_Day to go from resistance to recovery then rename them for nice plotting
productivity_effects$Trtmt_Day <- factor(productivity_effects$Trtmt_Day,
levels = c("resist", "recov_1", "recov_2"))
levels(productivity_effects$Trtmt_Day) <- c("Resistance", "Early Recovery", "Late Recovery")
# plot with group labels
ggplot(productivity_effects,
aes(x = est, y = as.factor(Heat), colour = Trtmt_Day)) +
geom_vline(xintercept = 0, colour="darkgrey") +
geom_point(position = position_dodge(width = 0.5)) +
geom_errorbarh(position = position_dodge(width = 0.5),
aes(xmin = loCI, xmax = hiCI), height = 0.1) +
geom_text(position = position_dodge(width = 0.5),
aes(x=-0.008, label=groups)) +
scale_colour_manual(values=trtmt_pal) +
labs(x = "Effect Size on Total Density",
y = "Heat duration (hrs)",
title = "protegens as non-focal predictor (i.e., marginalized)")
#######
# finally, we will do a series of pairwise two-tailed t-tests to compare between heat durations
#######
# estimate the sample sizes
temp <- productivity_effects # copy the effects to temp
productivity_effects <- rbind(temp %>% filter(Heat == 6) %>% mutate(n = estimate_n(absDen_6h)),
temp %>% filter(Heat == 12) %>% mutate(n = estimate_n(absDen_12h)),
temp %>% filter(Heat == 24) %>% mutate(n = estimate_n(absDen_24h)),
temp %>% filter(Heat == 48) %>% mutate(n = estimate_n(absDen_48h)))
rm(temp)
# estimate the SD from the SE
productivity_effects <- productivity_effects %>% mutate(SD = SE * sqrt(n)) %>%
# re-order by Heat and Trtmt_Day
arrange(Heat, Trtmt_Day)
# all pairwise combinations of comparisons between the same treatment day for different durations
temp <- t(combn(c(1,4,7,10), 2))
combos <- rbind(temp, temp+1, temp+2)
rm(temp)
# loop through all the combinations and do the t-tests
prodEffects_ttests <- data.frame()
for(i in 1:nrow(combos)){
prodEffects_ttests <- rbind(prodEffects_ttests,
run_ttest(row_x = combos[i,1],
row_y = combos[i,2],
summary_stats_df = productivity_effects))
}
prodEffects_ttests$adjusted_p <- p.adjust(prodEffects_ttests$pvalue, method = "bonferroni")
prodEffects_ttests$Trtmt_Day <- productivity_effects$Trtmt_Day[combos[,1]]
prodEffects_ttests$Heat_1 <- productivity_effects$Heat[combos[,1]]
prodEffects_ttests$Heat_2 <- productivity_effects$Heat[combos[,2]]
print(prodEffects_ttests)
## t_statistic df pvalue adjusted_p Trtmt_Day Heat_1 Heat_2
## t 10.47327399 30.75149 1.154061e-11 2.077310e-10 Resistance 6 12
## t1 21.17673146 28.53041 5.417906e-19 9.752231e-18 Resistance 6 24
## t2 32.89317973 18.11665 1.306276e-17 2.351297e-16 Resistance 6 48
## t3 6.85581717 22.61200 5.966686e-07 1.074004e-05 Resistance 12 24
## t4 27.60250117 19.01350 8.356449e-17 1.504161e-15 Resistance 12 48
## t5 26.05248055 16.16656 1.238834e-14 2.229901e-13 Resistance 24 48
## t6 -0.07249429 32.00521 9.426597e-01 1.000000e+00 Early Recovery 6 12
## t7 8.26687963 27.56078 6.068705e-09 1.092367e-07 Early Recovery 6 24
## t8 40.91143292 28.29271 1.011875e-26 1.821374e-25 Early Recovery 6 48
## t9 8.34414769 23.60875 1.687811e-08 3.038061e-07 Early Recovery 12 24
## t10 40.94532357 27.33141 4.530209e-26 8.154376e-25 Early Recovery 12 48
## t11 40.28429928 20.10302 1.059894e-20 1.907809e-19 Early Recovery 24 48
## t12 14.19702257 31.87762 2.481735e-15 4.467122e-14 Late Recovery 6 12
## t13 10.92229280 24.24178 7.582814e-11 1.364907e-09 Late Recovery 6 24
## t14 35.63914918 29.27681 1.139762e-25 2.051572e-24 Late Recovery 6 48
## t15 -7.44066706 20.33961 3.153933e-07 5.677079e-06 Late Recovery 12 24
## t16 23.04010331 28.32362 6.877370e-20 1.237927e-18 Late Recovery 12 48
## t17 33.76923521 18.60434 3.733642e-18 6.720555e-17 Late Recovery 24 48
# these p-values seem overly optimistic. Use alpha = 1*10^-3
Awesome! The effect sizes of the simpler model are indeed consistent with those from the complex model. We will focus on the parameter estimates and effect sizes from the simpler model.
decoupling_productivity <- productivity_effects %>% select(-loCI, -hiCI)
# keep just the Bonferroni-corrected (wider) confidence intervals
decoupling_productivity <- decoupling_productivity %>%
rename(loCI = loCI_bonAlpha,
hiCI = hiCI_bonAlpha)
# for easier coding, rename the levels of Trtmt_Day
levels(decoupling_productivity$Trtmt_Day) <- c("resist", "early_recov", "late_recov")
# create data.frame for plotting
decoupling_productivity <- decoupling_productivity %>%
select(-n, -SD) %>%
pivot_wider(names_from = Trtmt_Day,
values_from = c(est, loCI, hiCI, SE, groups))
# columns that indicate if resistance is significantly different from recovery
decoupling_productivity$early_recov_VS_resist <- mapply(are_groups_different,
decoupling_productivity$groups_early_recov,
decoupling_productivity$groups_resist)
decoupling_productivity$late_recov_VS_resist <- mapply(are_groups_different,
decoupling_productivity$groups_late_recov,
decoupling_productivity$groups_resist)
# clean up extra columns
decoupling_productivity <- decoupling_productivity %>% select(-groups_resist, -groups_early_recov, -groups_late_recov)
# first plot the decoupling on early recovery
ggplot(decoupling_productivity,
aes(x = est_resist, y = est_early_recov, colour = as.factor(Heat))) +
#facet_grid(~CommRich) +
geom_hline(yintercept = 0, colour="grey") +
geom_vline(xintercept = 0, colour="grey") +
geom_abline(slope = 1) +
geom_point(shape=21, size=3, aes(fill=as.factor(early_recov_VS_resist))) +
geom_errorbarh(aes(xmin = loCI_resist, xmax = hiCI_resist), height=0) +
geom_errorbar(aes(ymin = loCI_early_recov, ymax = hiCI_early_recov), width=0) +
# center the plot on 0,0:
scale_x_continuous(limits = c(-0.008, 0.008), expand = c(0, 0)) +
scale_y_continuous(limits = c(-0.0045, 0.0045), expand = c(0, 0)) +
scale_colour_viridis_d(option = "plasma", begin=0.2, end = 0.9) +
scale_fill_manual(values=c("white", "black")) +
labs(title = "Decoupling of productivity (with extinct reps)",
x = "Resistance +/- 95% CI",
y = "Early Recovery +/- 95% CI",
colour = "Heat\nDuration",
fill="Resistance\nvs. Recovery\nSignificantly\nDifferent?")
# here's another way to plot it where the confidence intervals are shown as ellipses:
ggplot(decoupling_productivity,
aes(x = est_resist, y = est_early_recov, colour = as.factor(Heat))) +
geom_hline(yintercept = 0, colour="grey") +
geom_vline(xintercept = 0, colour="grey") +
geom_abline(slope = 1) +
geom_point(shape=21, size=3, aes(fill=as.factor(early_recov_VS_resist))) +
scale_colour_viridis_d(option = "plasma", begin=0.2, end = 0.9) +
scale_fill_manual(values=c("white", "black")) +
geom_ellipse(aes(x0 = est_resist,
y0 = est_early_recov,
# radius on x direction:
a = hiCI_resist - est_resist,
# radius on y direction:
b = hiCI_early_recov - est_early_recov,
angle = 0)) +
labs(title = "Decoupling of productivity (with extinct reps)",
x = "Resistance +/- 95% CI",
y = "Early Recovery +/- 95% CI",
colour = "Heat\nDuration",
fill="Resistance\nvs. Recovery\nSignificantly\nDifferent?")
# next plot the decoupling on later recovery
ggplot(decoupling_productivity,
aes(x = est_resist, y = est_late_recov, colour = as.factor(Heat))) +
geom_hline(yintercept = 0, colour="grey") +
geom_vline(xintercept = 0, colour="grey") +
geom_abline(slope = 1) +
geom_point(shape=21, size=3, aes(fill=as.factor(late_recov_VS_resist))) +
geom_errorbarh(aes(xmin = loCI_resist, xmax = hiCI_resist), height=0) +
geom_errorbar(aes(ymin = loCI_late_recov, ymax = hiCI_late_recov), width=0) +
# center the plot on 0,0:
scale_x_continuous(limits = c(-0.008, 0.008), expand = c(0, 0)) +
scale_y_continuous(limits = c(-0.0036, 0.0036), expand = c(0, 0)) +
scale_colour_viridis_d(option = "plasma", begin=0.2, end = 0.9) +
scale_fill_manual(values=c("white", "black")) +
labs(title = "Decoupling of productivity (with extinct reps)",
x = "Resistance +/- 95% CI",
y = "Late Recovery +/- 95% CI",
colour = "Heat\nDuration",
fill="Resistance\nvs. Recovery\nSignificantly\nDifferent?")
# late recovery with CI plotted as ellipses:
ggplot(decoupling_productivity,
aes(x = est_resist, y = est_late_recov, colour = as.factor(Heat))) +
geom_hline(yintercept = 0, colour="grey") +
geom_vline(xintercept = 0, colour="grey") +
geom_abline(slope = 1) +
geom_point(shape=21, size=3, aes(fill=as.factor(late_recov_VS_resist))) +
scale_colour_viridis_d(option = "plasma", begin=0.2, end = 0.9) +
scale_fill_manual(values=c("white", "black")) +
geom_ellipse(aes(x0 = est_resist,
y0 = est_late_recov,
# radius on x direction:
a = hiCI_resist - est_resist,
# radius on y direction:
b = hiCI_late_recov - est_late_recov,
angle = 0)) +
labs(title = "Decoupling of productivity (with extinct reps)",
x = "Resistance +/- 95% CI",
y = "Late Recovery +/- 95% CI",
colour = "Heat\nDuration",
fill="Resistance\nvs. Recovery\nSignificantly\nDifferent?")
# finally estimate decoupling by getting the distance to the y=x line
# calculate decoupling between resistance and early recovery
early_decoupling <- t(with(decoupling_productivity,
mapply(estimate_decoupling,
resist_est = est_resist,
resist_hiCI = hiCI_resist,
recov_est = est_early_recov,
recov_hiCI = hiCI_early_recov)))
# add annotation
early_decoupling <- cbind(decoupling_productivity[,1:2],
early_decoupling)
ggplot(early_decoupling,
aes(x = as.factor(Heat), y = est_decoupling)) +
geom_hline(yintercept = 0, colour = "grey") +
geom_point(position = position_dodge(width = 0.5)) +
geom_errorbar(position = position_dodge(width = 0.5),
aes(ymin = loCI_decoupling, ymax = hiCI_decoupling),
alpha=0.4, width=0.1) +
labs(title = "Early recovery (WITH extinct reps)",
y = "Decoupling +/- 95% CI",
x = "Heat Duration (hrs)")
# calculate decoupling between resistance and late recovery
late_decoupling <- t(with(decoupling_productivity,
mapply(estimate_decoupling,
resist_est = est_resist,
resist_hiCI = hiCI_resist,
recov_est = est_late_recov,
recov_hiCI = hiCI_late_recov)))
# add annotation
late_decoupling <- cbind(decoupling_productivity[,1:2],
late_decoupling)
ggplot(late_decoupling,
aes(x = as.factor(Heat), y = est_decoupling)) +
geom_hline(yintercept = 0, colour = "grey") +
geom_point(position = position_dodge(width = 0.5)) +
geom_errorbar(position = position_dodge(width = 0.5),
aes(ymin = loCI_decoupling, ymax = hiCI_decoupling),
alpha=0.4, width=0.1) +
scale_colour_viridis_d(option = "viridis", end=0.85) +
labs(title = "Late recovery (WITH extinct reps)",
y = "Decoupling +/- 95% CI",
x = "Heat Duration (hrs)")
# clean up
rm(absDen_6h, absDen_12h, absDen_24h, absDen_48h, absDen_mods6h, absDen_mods12h, absDen_mods24h, absDen_mods48h,
combos, decoupling_productivity, early_decoupling, late_decoupling,
effect_6h, effect_12h, effect_24h, effect_48h, effect_6h_protegens, effect_12h_protegens, effect_24h_protegens, effect_48h_protegens,
posthoc_6h, posthoc_12h, posthoc_24h, posthoc_48h, posthoc_6h_WIDER, posthoc_12h_WIDER, posthoc_24h_WIDER, posthoc_48h_WIDER, posthocPROT_6h, posthocPROT_12h, posthocPROT_24h, posthocPROT_48h, posthocPROT_6h_WIDER, posthocPROT_12h_WIDER, posthocPROT_24h_WIDER, posthocPROT_48h_WIDER,
prod_effects_protegens, prodEffects_ttests, productivity_effects, productivity_protegens)
Let’s remove the extinct replicates to focus just on the data where the communities survived.
# add a column indicating whether the replicate survived
# but first we need to remove $Heat because it's a factor for diversity but numeric for extinctions and cannot be *_joined
tmp_div <- absDen_forFit %>% select(-Heat)
tmp_div <- inner_join(tmp_div,
extinct.df %>% select(uniqID, survived),
by = c("uniqID"))
absDen_forFit$survived <- tmp_div$survived
rm(tmp_div)
# keep just the diversity values that did not go extinct
absDen_forFit <- absDen_forFit %>% filter(survived == 1)
####################
# 6h heat duration
####################
# grab just the treatment with its associated control data
absDen_6h <- rbind(absDen_forFit %>% filter(Heat == "6"),
absDen_forFit %>% filter(Heat == "control", Day < 4))
# create a column for last day of heat, first day of recovery, and last day of recovery
absDen_6h$Trtmt_Day <- "resist"
absDen_6h$Trtmt_Day[absDen_6h$Day == 2] <- "recov_1"
absDen_6h$Trtmt_Day[absDen_6h$Day == 3] <- "recov_2"
# appropriately distinguish between numbers and factors
absDen_6h$Trtmt_Day <- as.factor(absDen_6h$Trtmt_Day)
absDen_6h$Heat <- droplevels(absDen_6h$Heat)
absDen_6h$resistant <- as.factor(absDen_6h$resistant)
absDen_6h$protegens <- as.factor(absDen_6h$protegens)
# fit different models:
absDen_mods6h <- fit_productivity_models(absDen_6h)
# check the simplest possible models for multicolinearity
check_collinearity(absDen_mods6h[["simple"]])
check_collinearity(absDen_mods6h[["simple resist"]])
# print a summary table of the model fits
data.frame(pars = sapply(absDen_mods6h, npar_of_glmmTMB_fit),
AIC = sapply(absDen_mods6h, AIC),
AICc = sapply(absDen_mods6h, AICc),
BIC = sapply(absDen_mods6h, BIC)) %>%
mutate(dAIC = min(AIC)-AIC,
dAICc = min(AICc)-AICc,
dBIC = min(BIC)-BIC) %>% arrange(BIC)
# plot the best model for 6h:
print(plot_model_pred.MU(mod_list=absDen_mods6h, mod_name="*mu +prot"))
# plot the best model for the complete data:
print(plot_model_pred.MU(mod_list=absDen_mods6h, mod_name="*prot*mu +CommRich"))
# plot the preferred model for the complete data:
print(plot_model_pred.MU(mod_list=absDen_mods6h, mod_name="*prot +mu + prot*CommRich"))
# check the fit and estimates of the best model for the complete data:
#simulateResiduals(fittedModel = absDen_mods6h[["*prot*mu +CommRich"]], plot = TRUE)
#summary(absDen_mods6h[["*prot*mu +CommRich"]])
# check the fit and estimates of the preferred model for the complete data:
simulateResiduals(fittedModel = absDen_mods6h[["*prot*mu +CommRich"]], plot = TRUE)
## Object of Class DHARMa with simulated residuals based on 250 simulations with refit = FALSE . See ?DHARMa::simulateResiduals for help.
##
## Scaled residual values: 0.624 0.96 0.724 0.6761063 0.992 0.872 0.584 0.02 0.916 0.4455416 0.964 1 0.752 0.4 0.864 0.4146115 0.112 0.5655875 0.5192903 0.572 ...
summary(absDen_mods6h[["*prot +mu + prot*CommRich"]])
## Family: genpois ( log )
## Formula:
## as.integer(TotDensity_scale * 1000) ~ Heat * Trtmt_Day * protegens +
## community_expected_mu + protegens * CommRich
## Data: data_subset
##
## AIC BIC logLik deviance df.resid
## 5662.5 5726.0 -2815.2 5630.5 376
##
##
## Dispersion parameter for genpois family (): 368
##
## Conditional model:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 7.839618 0.228850 34.26 < 2e-16
## Heatcontrol -0.180861 0.118842 -1.52 0.1280
## Trtmt_Dayrecov_2 -0.018307 0.097866 -0.19 0.8516
## Trtmt_Dayresist -0.008151 0.101418 -0.08 0.9359
## protegens1 -1.304007 0.181747 -7.17 7.24e-13
## community_expected_mu -0.491750 0.214502 -2.29 0.0219
## CommRich -0.003971 0.044970 -0.09 0.9296
## Heatcontrol:Trtmt_Dayrecov_2 -0.137528 0.167598 -0.82 0.4119
## Heatcontrol:Trtmt_Dayresist 0.174712 0.164347 1.06 0.2878
## Heatcontrol:protegens1 -0.061370 0.180549 -0.34 0.7339
## Trtmt_Dayrecov_2:protegens1 -0.333568 0.165252 -2.02 0.0435
## Trtmt_Dayresist:protegens1 0.061134 0.164067 0.37 0.7094
## protegens1:CommRich 0.032276 0.064557 0.50 0.6171
## Heatcontrol:Trtmt_Dayrecov_2:protegens1 0.075049 0.261004 0.29 0.7737
## Heatcontrol:Trtmt_Dayresist:protegens1 0.251918 0.250033 1.01 0.3137
##
## (Intercept) ***
## Heatcontrol
## Trtmt_Dayrecov_2
## Trtmt_Dayresist
## protegens1 ***
## community_expected_mu *
## CommRich
## Heatcontrol:Trtmt_Dayrecov_2
## Heatcontrol:Trtmt_Dayresist
## Heatcontrol:protegens1
## Trtmt_Dayrecov_2:protegens1 *
## Trtmt_Dayresist:protegens1
## protegens1:CommRich
## Heatcontrol:Trtmt_Dayrecov_2:protegens1
## Heatcontrol:Trtmt_Dayresist:protegens1
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
####################
# 12h heat duration
####################
# grab just the treatment with its associated control data
absDen_12h <- rbind(absDen_forFit %>% filter(Heat == "12", Day > 1),
absDen_forFit %>% filter(Heat == "control", Day > 1, Day != 5))
# create a column for last day of heat, first day of recovery, and last day of recovery
absDen_12h$Trtmt_Day <- "resist"
absDen_12h$Trtmt_Day[absDen_12h$Day == 3] <- "recov_1"
absDen_12h$Trtmt_Day[absDen_12h$Day == 4] <- "recov_2"
# appropriately distinguish between numbers and factors
absDen_12h$Trtmt_Day <- as.factor(absDen_12h$Trtmt_Day)
absDen_12h$Heat <- droplevels(absDen_12h$Heat)
absDen_12h$resistant <- as.factor(absDen_12h$resistant)
absDen_12h$protegens <- as.factor(absDen_12h$protegens)
# fit different models:
absDen_mods12h <- fit_productivity_models(absDen_12h)
# check the simplest possible models for multicolinearity
check_collinearity(absDen_mods12h[["simple"]])
check_collinearity(absDen_mods12h[["simple resist"]])
# print a summary table of the model fits
data.frame(pars = sapply(absDen_mods12h, npar_of_glmmTMB_fit),
AIC = sapply(absDen_mods12h, AIC),
AICc = sapply(absDen_mods12h, AICc),
BIC = sapply(absDen_mods12h, BIC)) %>%
mutate(dAIC = min(AIC)-AIC,
dAICc = min(AICc)-AICc,
dBIC = min(BIC)-BIC) %>% arrange(BIC)
# plot the best model for 12h:
print(plot_model_pred.MU(mod_list=absDen_mods12h, mod_name="*mu +prot"))
# plot the best model for the complete data:
print(plot_model_pred.MU(mod_list=absDen_mods12h, mod_name="*prot*mu +CommRich"))
# plot the preferred model for the complete data:
print(plot_model_pred.MU(mod_list=absDen_mods12h, mod_name="*prot +mu + prot*CommRich"))
# check the fit and estimates of the best model for the complete data:
#simulateResiduals(fittedModel = absDen_mods12h[["*prot*mu +CommRich"]], plot = TRUE)
#summary(absDen_mods12h[["*prot*mu +CommRich"]])
# check the fit and estimates of the preferred model for the complete data:
simulateResiduals(fittedModel = absDen_mods12h[["*prot*mu +CommRich"]], plot = TRUE)
## Object of Class DHARMa with simulated residuals based on 250 simulations with refit = FALSE . See ?DHARMa::simulateResiduals for help.
##
## Scaled residual values: 0.876 0.632 0.2287389 0.686251 0.2893171 0.468 0.08077386 0.628 0.82 0.42 0.672 0.708 0.532 0.4667235 0.568 0.584 0.4163186 0.3632073 0.5082269 0.92 ...
summary(absDen_mods12h[["*prot +mu + prot*CommRich"]])
## Family: genpois ( log )
## Formula:
## as.integer(TotDensity_scale * 1000) ~ Heat * Trtmt_Day * protegens +
## community_expected_mu + protegens * CommRich
## Data: data_subset
##
## AIC BIC logLik deviance df.resid
## 4935.9 4997.8 -2452.0 4903.9 337
##
##
## Dispersion parameter for genpois family (): 446
##
## Conditional model:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 7.01699 0.32692 21.464 < 2e-16
## Heatcontrol -0.21263 0.13850 -1.535 0.1247
## Trtmt_Dayrecov_2 -0.37577 0.14055 -2.674 0.0075
## Trtmt_Dayresist -0.86993 0.16953 -5.131 2.88e-07
## protegens1 -1.25584 0.23759 -5.286 1.25e-07
## community_expected_mu 0.07319 0.29296 0.250 0.8027
## CommRich 0.11143 0.07060 1.578 0.1145
## Heatcontrol:Trtmt_Dayrecov_2 0.51493 0.20295 2.537 0.0112
## Heatcontrol:Trtmt_Dayresist 1.11031 0.21161 5.247 1.55e-07
## Heatcontrol:protegens1 -0.09314 0.20466 -0.455 0.6490
## Trtmt_Dayrecov_2:protegens1 0.18835 0.20302 0.928 0.3535
## Trtmt_Dayresist:protegens1 0.89448 0.21920 4.081 4.49e-05
## protegens1:CommRich -0.08730 0.08660 -1.008 0.3134
## Heatcontrol:Trtmt_Dayrecov_2:protegens1 -0.17894 0.29796 -0.601 0.5481
## Heatcontrol:Trtmt_Dayresist:protegens1 -0.71900 0.29616 -2.428 0.0152
##
## (Intercept) ***
## Heatcontrol
## Trtmt_Dayrecov_2 **
## Trtmt_Dayresist ***
## protegens1 ***
## community_expected_mu
## CommRich
## Heatcontrol:Trtmt_Dayrecov_2 *
## Heatcontrol:Trtmt_Dayresist ***
## Heatcontrol:protegens1
## Trtmt_Dayrecov_2:protegens1
## Trtmt_Dayresist:protegens1 ***
## protegens1:CommRich
## Heatcontrol:Trtmt_Dayrecov_2:protegens1
## Heatcontrol:Trtmt_Dayresist:protegens1 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
####################
# 24h heat duration
####################
# grab just the treatment with its associated control data
absDen_24h <- rbind(absDen_forFit %>% filter(Heat == "24", Day > 1),
absDen_forFit %>% filter(Heat == "control", Day > 1, Day != 5))
# create a column for last day of heat, first day of recovery, and last day of recovery
absDen_24h$Trtmt_Day <- "resist"
absDen_24h$Trtmt_Day[absDen_24h$Day == 3] <- "recov_1"
absDen_24h$Trtmt_Day[absDen_24h$Day == 4] <- "recov_2"
# appropriately distinguish between numbers and factors
absDen_24h$Trtmt_Day <- as.factor(absDen_24h$Trtmt_Day)
absDen_24h$Heat <- droplevels(absDen_24h$Heat)
absDen_24h$resistant <- as.factor(absDen_24h$resistant)
absDen_24h$protegens <- as.factor(absDen_24h$protegens)
# fit different models:
absDen_mods24h <- fit_productivity_models(absDen_24h)
# check the simplest possible models for multicolinearity
check_collinearity(absDen_mods24h[["simple"]])
check_collinearity(absDen_mods24h[["simple resist"]])
# print a summary table of the model fits
data.frame(pars = sapply(absDen_mods24h, npar_of_glmmTMB_fit),
AIC = sapply(absDen_mods24h, AIC),
AICc = sapply(absDen_mods24h, AICc),
BIC = sapply(absDen_mods24h, BIC)) %>%
mutate(dAIC = min(AIC)-AIC,
dAICc = min(AICc)-AICc,
dBIC = min(BIC)-BIC) %>% arrange(BIC)
# plot the best model for the 24h data without extinctions:
print(plot_model_pred.CommRich(mod_list=absDen_mods24h, mod_name="*prot*mu +CommRich"))
# plot the best model for the complete data:
print(plot_model_pred.MU(mod_list=absDen_mods24h, mod_name="*prot*mu +CommRich"))
# plot the preferred model for the complete data:
print(plot_model_pred.MU(mod_list=absDen_mods24h, mod_name="*prot +mu + prot*CommRich"))
# check the fit and estimates of the best model for the complete data:
#simulateResiduals(fittedModel = absDen_mods24h[["*prot*mu +CommRich"]], plot = TRUE)
#summary(absDen_mods24h[["*prot*mu +CommRich"]])
# check the fit and estimates of the preferred model for the complete data:
simulateResiduals(fittedModel = absDen_mods24h[["*prot +mu + prot*CommRich"]], plot = TRUE)
## Object of Class DHARMa with simulated residuals based on 250 simulations with refit = FALSE . See ?DHARMa::simulateResiduals for help.
##
## Scaled residual values: 0.5483252 0.384947 0.332 0.6092959 0.688 0.5308852 0.464 0.2199125 0.2281527 0.2535313 0.7 0.752 0.5366702 0.144 0.2 0.2025676 0.4616551 0.284 0.076 0.2775874 ...
summary(absDen_mods24h[["*prot +mu + prot*CommRich"]])
## Family: genpois ( log )
## Formula:
## as.integer(TotDensity_scale * 1000) ~ Heat * Trtmt_Day * protegens +
## community_expected_mu + protegens * CommRich
## Data: data_subset
##
## AIC BIC logLik deviance df.resid
## 4891.5 4953.2 -2429.8 4859.5 334
##
##
## Dispersion parameter for genpois family (): 769
##
## Conditional model:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 4.71414 0.31230 15.095 < 2e-16
## Heatcontrol 1.18929 0.24211 4.912 9.01e-07
## Trtmt_Dayrecov_2 2.04329 0.21341 9.574 < 2e-16
## Trtmt_Dayresist -1.58085 0.27065 -5.841 5.19e-09
## protegens1 0.98405 0.24890 3.954 7.70e-05
## community_expected_mu 0.70145 0.32696 2.145 0.031924
## CommRich 0.32409 0.07376 4.394 1.11e-05
## Heatcontrol:Trtmt_Dayrecov_2 -1.86530 0.27428 -6.801 1.04e-11
## Heatcontrol:Trtmt_Dayresist 1.92680 0.33187 5.806 6.40e-09
## Heatcontrol:protegens1 -1.74930 0.28922 -6.048 1.46e-09
## Trtmt_Dayrecov_2:protegens1 -2.47075 0.26515 -9.318 < 2e-16
## Trtmt_Dayresist:protegens1 1.10425 0.30938 3.569 0.000358
## protegens1:CommRich -0.31366 0.08945 -3.507 0.000454
## Heatcontrol:Trtmt_Dayrecov_2:protegens1 2.44064 0.35658 6.845 7.67e-12
## Heatcontrol:Trtmt_Dayresist:protegens1 -1.04441 0.39442 -2.648 0.008098
##
## (Intercept) ***
## Heatcontrol ***
## Trtmt_Dayrecov_2 ***
## Trtmt_Dayresist ***
## protegens1 ***
## community_expected_mu *
## CommRich ***
## Heatcontrol:Trtmt_Dayrecov_2 ***
## Heatcontrol:Trtmt_Dayresist ***
## Heatcontrol:protegens1 ***
## Trtmt_Dayrecov_2:protegens1 ***
## Trtmt_Dayresist:protegens1 ***
## protegens1:CommRich ***
## Heatcontrol:Trtmt_Dayrecov_2:protegens1 ***
## Heatcontrol:Trtmt_Dayresist:protegens1 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
####################
# 48h heat duration
####################
# grab just the treatment with its associated control data
absDen_48h <- rbind(absDen_forFit %>% filter(Heat == "48", Day > 2),
absDen_forFit %>% filter(Heat == "control", Day > 2))
# create a column for last day of heat, first day of recovery, and last day of recovery
absDen_48h$Trtmt_Day <- "resist"
absDen_48h$Trtmt_Day[absDen_48h$Day == 4] <- "recov_1"
absDen_48h$Trtmt_Day[absDen_48h$Day == 5] <- "recov_2"
# appropriately distinguish between numbers and factors
absDen_48h$Trtmt_Day <- as.factor(absDen_48h$Trtmt_Day)
absDen_48h$Heat <- droplevels(absDen_48h$Heat)
absDen_48h$resistant <- as.factor(absDen_48h$resistant)
absDen_48h$protegens <- as.factor(absDen_48h$protegens)
# fit different models:
absDen_mods48h <- fit_productivity_models(absDen_48h)
# check the simplest possible models for multicolinearity
check_collinearity(absDen_mods48h[["simple"]])
check_collinearity(absDen_mods48h[["simple resist"]])
# print a summary table of the model fits
data.frame(pars = sapply(absDen_mods48h, npar_of_glmmTMB_fit),
AIC = sapply(absDen_mods48h, AIC),
AICc = sapply(absDen_mods48h, AICc),
BIC = sapply(absDen_mods48h, BIC)) %>%
mutate(dAIC = min(AIC)-AIC,
dAICc = min(AICc)-AICc,
dBIC = min(BIC)-BIC) %>% arrange(BIC)
# plot the best model for 48h:
print(plot_model_pred.CommRich(mod_list=absDen_mods48h, mod_name="+CommRich +prot"))
## Warning in scale_y_log10(): log-10 transformation introduced infinite values.
# plot the best model for the complete data:
print(plot_model_pred.MU(mod_list=absDen_mods48h, mod_name="*prot*mu +CommRich"))
## Warning in scale_y_log10(): log-10 transformation introduced infinite values.
# plot the preferred model for the complete data:
print(plot_model_pred.CommRich(mod_list=absDen_mods48h, mod_name="*prot +mu + prot*CommRich"))
## Warning in scale_y_log10(): log-10 transformation introduced infinite values.
# check the fit and estimates of the best model for the complete data:
#simulateResiduals(fittedModel = absDen_mods48h[["*prot*mu +CommRich"]], plot = TRUE)
#summary(absDen_mods48h[["*prot*mu +CommRich"]])
# check the fit and estimates of the preferred model for the complete data:
simulateResiduals(fittedModel = absDen_mods48h[["*prot +mu + prot*CommRich"]], plot = TRUE)
## Object of Class DHARMa with simulated residuals based on 250 simulations with refit = FALSE . See ?DHARMa::simulateResiduals for help.
##
## Scaled residual values: 0.6503375 0.6930755 0.8098628 0.6608687 0.6388946 0.7922922 0.6865314 0.540222 0.5938006 0.5679162 0.5248042 0.005801423 0.1950582 0.3731587 0.3880367 0.1832986 0.3625339 0.5414847 0.3098432 0.731147 ...
summary(absDen_mods48h[["*prot +mu + prot*CommRich"]])
## Family: genpois ( log )
## Formula:
## as.integer(TotDensity_scale * 1000) ~ Heat * Trtmt_Day * protegens +
## community_expected_mu + protegens * CommRich
## Data: data_subset
##
## AIC BIC logLik deviance df.resid
## 3482.6 3541.0 -1725.3 3450.6 269
##
##
## Dispersion parameter for genpois family (): 273
##
## Conditional model:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 7.87541 0.36970 21.302 < 2e-16
## Heatcontrol -0.05482 0.15455 -0.355 0.722804
## Trtmt_Dayrecov_2 -0.21690 0.17239 -1.258 0.208332
## Trtmt_Dayresist -4.78317 0.70978 -6.739 1.60e-11
## protegens1 -1.57374 0.25504 -6.171 6.80e-10
## community_expected_mu -0.71331 0.31059 -2.297 0.021640
## CommRich -0.04159 0.07091 -0.586 0.557584
## Heatcontrol:Trtmt_Dayrecov_2 -0.80020 0.22933 -3.489 0.000484
## Heatcontrol:Trtmt_Dayresist 4.68262 0.72165 6.489 8.65e-11
## Heatcontrol:protegens1 -0.07258 0.21600 -0.336 0.736868
## Trtmt_Dayrecov_2:protegens1 -0.01591 0.22686 -0.070 0.944073
## Trtmt_Dayresist:protegens1 1.36098 0.77106 1.765 0.077552
## protegens1:CommRich 0.04398 0.08902 0.494 0.621299
## Heatcontrol:Trtmt_Dayrecov_2:protegens1 0.80333 0.31485 2.551 0.010727
## Heatcontrol:Trtmt_Dayresist:protegens1 -1.40448 0.79761 -1.761 0.078263
##
## (Intercept) ***
## Heatcontrol
## Trtmt_Dayrecov_2
## Trtmt_Dayresist ***
## protegens1 ***
## community_expected_mu *
## CommRich
## Heatcontrol:Trtmt_Dayrecov_2 ***
## Heatcontrol:Trtmt_Dayresist ***
## Heatcontrol:protegens1
## Trtmt_Dayrecov_2:protegens1
## Trtmt_Dayresist:protegens1 .
## protegens1:CommRich
## Heatcontrol:Trtmt_Dayrecov_2:protegens1 *
## Heatcontrol:Trtmt_Dayresist:protegens1 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
####################
# Select the best model across all data subsets
####################
# for each information criterion, get the average across all data subsets
meanIC <- data.frame(pars = sapply(absDen_mods48h, npar_of_glmmTMB_fit),
AIC = sapply(absDen_mods6h, AIC) + sapply(absDen_mods12h, AIC) + sapply(absDen_mods24h, AIC) + sapply(absDen_mods48h, AIC),
AICc = sapply(absDen_mods6h, AICc) + sapply(absDen_mods12h, AICc) + sapply(absDen_mods24h, AICc) + sapply(absDen_mods48h, AICc),
BIC = sapply(absDen_mods6h, BIC) + sapply(absDen_mods12h, BIC) + sapply(absDen_mods24h, BIC) + sapply(absDen_mods48h, BIC)) %>%
mutate(AIC = AIC/4,
AICc = AICc/4,
BIC = BIC/4) %>%
mutate(dAIC = min(AIC)-AIC,
dAICc = min(AICc)-AICc,
dBIC = min(BIC)-BIC)
meanIC %>% arrange(BIC)
meanIC %>% arrange(AIC)
# clean up
rm(meanIC)
Annoyingly enough, the best model as well as the preferred model are changed now that we consider the data without any extinctions…
Welp, I’m going to still use the “*prot +mu + prot*CommRich” model. There’s a different result now for late recovery at 48h: a positive effect of heat duration. When I tried to do the downstream analysis with the model preferred for the extinction data, ““, the results were the same. So this seems to be describing a trend that’s present in the data and not an issue with the model selection itself.
# plot the effect size contingent on protegens
# use the same model as above, "*prot +mu + prot*CommRich"
effect_6h_protegens <- eff_size(emmeans(absDen_mods6h[["*prot +mu + prot*CommRich"]], ~ Heat | Trtmt_Day*protegens + community_expected_mu + protegens*CommRich, data = absDen_6h),
sigma(absDen_mods6h[["*prot +mu + prot*CommRich"]]),
edf = df.residual(absDen_mods6h[["*prot +mu + prot*CommRich"]]))
effect_12h_protegens <- eff_size(emmeans(absDen_mods12h[["*prot +mu + prot*CommRich"]], ~ Heat | Trtmt_Day*protegens + community_expected_mu + protegens*CommRich, data = absDen_12h),
sigma(absDen_mods12h[["*prot +mu + prot*CommRich"]]),
edf = df.residual(absDen_mods12h[["*prot +mu + prot*CommRich"]]))
effect_24h_protegens <- eff_size(emmeans(absDen_mods24h[["*prot +mu + prot*CommRich"]], ~ Heat | Trtmt_Day*protegens + community_expected_mu + protegens*CommRich, data = absDen_24h),
sigma(absDen_mods24h[["*prot +mu + prot*CommRich"]]),
edf = df.residual(absDen_mods24h[["*prot +mu + prot*CommRich"]]))
effect_48h_protegens <- eff_size(emmeans(absDen_mods48h[["*prot +mu + prot*CommRich"]], ~ Heat | Trtmt_Day*protegens + community_expected_mu + protegens*CommRich, data = absDen_48h),
sigma(absDen_mods48h[["*prot +mu + prot*CommRich"]]),
edf = df.residual(absDen_mods48h[["*prot +mu + prot*CommRich"]]))
# use the overall preferred model for the data excluding extinctions:
#effect_6h_protegens <- eff_size(emmeans(absDen_mods6h[["*prot +CommRich"]], ~ Heat | Trtmt_Day*protegens + CommRich, data = absDen_6h),
# sigma(absDen_mods6h[["*prot +CommRich"]]),
# edf = df.residual(absDen_mods6h[["*prot +CommRich"]]))
#effect_12h_protegens <- eff_size(emmeans(absDen_mods12h[["*prot +CommRich"]], ~ Heat | #Trtmt_Day*protegens + CommRich, data = absDen_12h),
# sigma(absDen_mods12h[["*prot +CommRich"]]),
# edf = df.residual(absDen_mods12h[["*prot +CommRich"]]))
#effect_24h_protegens <- eff_size(emmeans(absDen_mods24h[["*prot +CommRich"]], ~ Heat | #Trtmt_Day*protegens + CommRich, data = absDen_24h),
# sigma(absDen_mods24h[["*prot +CommRich"]]),
# edf = df.residual(absDen_mods24h[["*prot +CommRich"]]))
#effect_48h_protegens <- eff_size(emmeans(absDen_mods48h[["*prot +CommRich"]], ~ Heat | Trtmt_Day*protegens + CommRich, data = absDen_48h),
# sigma(absDen_mods48h[["*prot +CommRich"]]),
# edf = df.residual(absDen_mods48h[["*prot +CommRich"]]))
# a function that extracts the confidence intervals from eff_size contingent on protegens
get_effsize_CIs <- function(eff_size_object, heat_trtmt) {
data.frame(Heat = heat_trtmt,
# this function will fail with "*prot +CommRich"
# so you need to modify the code as indicated in the commented out bits:
CommRich = confint(eff_size_object)[[5]], #[[4]],
Trtmt_Day = confint(eff_size_object)[[2]], #[[2]],
protegens = confint(eff_size_object)[[3]],
community_expected_mu = confint(eff_size_object)[[4]], # this whole line needs to be removed
effect_est = confint(eff_size_object)[[5]],
effect_loCI = confint(eff_size_object)[[8]],
effect_hiCI = confint(eff_size_object)[[9]])
}
# create a data.frame for plotting marginal effect sizes using a forest plot
productivity_protegens <- data.frame()
productivity_protegens <- rbind(productivity_protegens,
get_effsize_CIs(effect_6h_protegens, heat_trtmt = 6),
get_effsize_CIs(effect_12h_protegens, heat_trtmt = 12),
get_effsize_CIs(effect_24h_protegens, heat_trtmt = 24),
get_effsize_CIs(effect_48h_protegens, heat_trtmt = 48))
# re-order the levels of Trtmt_Day to go from resistance to recovery then rename them for nice plotting
productivity_protegens$Trtmt_Day <- factor(productivity_protegens$Trtmt_Day,
levels = c("resist", "recov_1", "recov_2"))
levels(productivity_protegens$Trtmt_Day) <- c("Resistance", "Early Recovery", "Late Recovery")
#plot
ggplot(productivity_protegens,
aes(x = effect_est, y = as.factor(Heat), colour = Trtmt_Day, shape = protegens)) +
geom_vline(xintercept = 0, colour="darkgrey") +
geom_point(position = position_dodge(width = 0.5)) +
geom_errorbarh(position = position_dodge(width = 0.5),
aes(xmin = effect_loCI, xmax = effect_hiCI), height = 0.1) +
scale_colour_manual(values=trtmt_pal) +
labs(x = "Effect Size on Total Density",
y = "Heat duration (hrs)",
shape = "protegens\npresent?",
title = "*prot +mu + prot*CommRich")
# we can do a posthoc on this to illustrate statistically significant effects
posthocPROT_6h <- emmeans(effect_6h_protegens, pairwise ~ Trtmt_Day*protegens, data = absDen_6h)
## NOTE: Results may be misleading due to involvement in interactions
posthocPROT_12h <- emmeans(effect_12h_protegens, pairwise ~ Trtmt_Day*protegens, data = absDen_12h)
## NOTE: Results may be misleading due to involvement in interactions
posthocPROT_24h <- emmeans(effect_24h_protegens, pairwise ~ Trtmt_Day*protegens, data = absDen_24h)
## NOTE: Results may be misleading due to involvement in interactions
posthocPROT_48h <- emmeans(effect_48h_protegens, pairwise ~ Trtmt_Day*protegens, data = absDen_48h)
## NOTE: Results may be misleading due to involvement in interactions
# create a data.frame for plotting
prod_effects_protegens <- data.frame()
prod_effects_protegens <- rbind(prod_effects_protegens,
get_posthoc_YESprot(posthocPROT_6h, heat_trtmt = 6),
get_posthoc_YESprot(posthocPROT_12h, heat_trtmt = 12),
get_posthoc_YESprot(posthocPROT_24h, heat_trtmt = 24),
get_posthoc_YESprot(posthocPROT_48h, heat_trtmt = 48))
## note that for the decoupling plots I am using a Bonferroni-corrected alpha.
# let's get those confidence intervals because we will need them to plot decoupling below:
posthocPROT_6h_WIDER <- emmeans(effect_6h_protegens, pairwise ~ Trtmt_Day*protegens, data = absDen_6h, level=0.9875)
## NOTE: Results may be misleading due to involvement in interactions
posthocPROT_12h_WIDER <- emmeans(effect_12h_protegens, pairwise ~ Trtmt_Day*protegens, data = posthocPROT_12h, level=0.9875)
## NOTE: Results may be misleading due to involvement in interactions
posthocPROT_24h_WIDER <- emmeans(effect_24h_protegens, pairwise ~ Trtmt_Day*protegens, data = absDen_24h, level=0.9875)
## NOTE: Results may be misleading due to involvement in interactions
posthocPROT_48h_WIDER <- emmeans(effect_48h_protegens, pairwise ~ Trtmt_Day*protegens, data = absDen_48h, level=0.9875)
## NOTE: Results may be misleading due to involvement in interactions
# put these wider CIs into a table
widerCIs <- data.frame()
widerCIs <- rbind(widerCIs,
get_posthoc_YESprot(posthocPROT_6h_WIDER, heat_trtmt = 6),
get_posthoc_YESprot(posthocPROT_12h_WIDER, heat_trtmt = 12),
get_posthoc_YESprot(posthocPROT_24h_WIDER, heat_trtmt = 24),
get_posthoc_YESprot(posthocPROT_48h_WIDER, heat_trtmt = 48))
# rename the columns to remind us that this is the Bonferroni corrected alpha
colnames(widerCIs)[5:6] <- c("loCI_bonAlpha", "hiCI_bonAlpha")
# combine the wider CIs with the effect sizes
prod_effects_protegens <- inner_join(prod_effects_protegens,
widerCIs %>% select(-SE))
## Joining with `by = join_by(Trtmt_Day, protegens, est, groups, Heat)`
rm(widerCIs)
# re-order the levels of Trtmt_Day to go from resistance to recovery then rename them for nice plotting
prod_effects_protegens$Trtmt_Day <- factor(prod_effects_protegens$Trtmt_Day,
levels = c("resist", "recov_1", "recov_2"))
levels(prod_effects_protegens$Trtmt_Day) <- c("Resistance", "Early Recovery", "Late Recovery")
# plot with group labels
ggplot(prod_effects_protegens,
aes(x = est, y = as.factor(Heat), colour = Trtmt_Day, shape=protegens)) +
facet_grid( ~ protegens) +
geom_vline(xintercept = 0, colour="darkgrey") +
geom_point(position = position_dodge(width = 0.5)) +
geom_errorbarh(position = position_dodge(width = 0.5),
aes(xmin = loCI, xmax = hiCI), height = 0.1) +
geom_text(position = position_dodge(width = 0.5),
aes(x=-0.0035, label=groups)) +
scale_colour_manual(values=trtmt_pal) +
labs(x = "Effect Size on Total Density",
y="Heat duration",
shape = "protegens\npresent?",
title = "*prot +mu + prot*CommRich")
decoupling_productivity <- prod_effects_protegens %>% select(-loCI, -hiCI)
# keep just the Bonferroni-corrected (wider) confidence intervals
decoupling_productivity <- decoupling_productivity %>%
rename(loCI = loCI_bonAlpha,
hiCI = hiCI_bonAlpha)
# for easier coding, rename the levels of Trtmt_Day
levels(decoupling_productivity$Trtmt_Day) <- c("resist", "early_recov", "late_recov")
# create data.frame for plotting
decoupling_productivity <- decoupling_productivity %>%
pivot_wider(names_from = Trtmt_Day,
values_from = c(est, loCI, hiCI, SE, groups))
# columns that indicate if resistance is significantly different from recovery
decoupling_productivity$early_recov_VS_resist <- mapply(are_groups_different,
decoupling_productivity$groups_early_recov,
decoupling_productivity$groups_resist)
decoupling_productivity$late_recov_VS_resist <- mapply(are_groups_different,
decoupling_productivity$groups_late_recov,
decoupling_productivity$groups_resist)
# clean up extra columns
decoupling_productivity <- decoupling_productivity %>% select(-groups_resist, -groups_early_recov, -groups_late_recov)
# first plot the decoupling on early recovery
ggplot(decoupling_productivity,
aes(x = est_resist, y = est_early_recov, colour = as.factor(Heat))) +
facet_grid(~protegens) +
geom_hline(yintercept = 0, colour="grey") +
geom_vline(xintercept = 0, colour="grey") +
geom_abline(slope = 1) +
geom_point(shape=21, size=3, aes(fill=as.factor(early_recov_VS_resist))) +
geom_errorbarh(aes(xmin = loCI_resist, xmax = hiCI_resist), height=0) +
geom_errorbar(aes(ymin = loCI_early_recov, ymax = hiCI_early_recov), width=0) +
# center the plot on 0,0:
scale_x_continuous(limits = c(-0.025, 0.025), expand = c(0, 0)) +
scale_y_continuous(limits = c(-0.0025, 0.0025), expand = c(0, 0)) +
scale_colour_viridis_d(option = "plasma", begin=0.2, end = 0.9) +
scale_fill_manual(values=c("white", "black")) +
labs(title = "Decoupling of productivity (NO extinct reps)",
x = "Resistance +/- 95% CI",
y = "Early Recovery +/- 95% CI",
colour = "Heat\nDuration",
fill="Resistance\nvs. Recovery\nSignificantly\nDifferent?")
# here's another way to plot it where the confidence intervals are shown as ellipses:
ggplot(decoupling_productivity,
aes(x = est_resist, y = est_early_recov, colour = as.factor(Heat))) +
facet_grid(~protegens) +
geom_hline(yintercept = 0, colour="grey") +
geom_vline(xintercept = 0, colour="grey") +
geom_abline(slope = 1) +
geom_point(shape=21, size=3, aes(fill=as.factor(early_recov_VS_resist))) +
scale_colour_viridis_d(option = "plasma", begin=0.2, end = 0.9) +
scale_fill_manual(values=c("white", "black")) +
geom_ellipse(aes(x0 = est_resist,
y0 = est_early_recov,
# radius on x direction:
a = hiCI_resist - est_resist,
# radius on y direction:
b = hiCI_early_recov - est_early_recov,
angle = 0)) +
labs(title = "Decoupling of productivity (NO extinct reps)",
x = "Resistance +/- 95% CI",
y = "Early Recovery +/- 95% CI",
colour = "Heat\nDuration",
fill="Resistance\nvs. Recovery\nSignificantly\nDifferent?")
# next plot the decoupling on later recovery
ggplot(decoupling_productivity,
aes(x = est_resist, y = est_late_recov, colour = as.factor(Heat))) +
facet_grid(~protegens) +
geom_hline(yintercept = 0, colour="grey") +
geom_vline(xintercept = 0, colour="grey") +
geom_abline(slope = 1) +
geom_point(shape=21, size=3, aes(fill=as.factor(late_recov_VS_resist))) +
geom_errorbarh(aes(xmin = loCI_resist, xmax = hiCI_resist), height=0) +
geom_errorbar(aes(ymin = loCI_late_recov, ymax = hiCI_late_recov), width=0) +
# center the plot on 0,0:
scale_x_continuous(limits = c(-0.025, 0.025), expand = c(0, 0)) +
scale_y_continuous(limits = c(-0.005, 0.005), expand = c(0, 0)) +
scale_colour_viridis_d(option = "plasma", begin=0.2, end = 0.9) +
scale_fill_manual(values=c("white", "black")) +
labs(title = "Decoupling of productivity (NO extinct reps)",
x = "Resistance +/- 95% CI",
y = "Late Recovery +/- 95% CI",
colour = "Heat\nDuration",
fill="Resistance\nvs. Recovery\nSignificantly\nDifferent?")
# late recovery with CI plotted as ellipses:
ggplot(decoupling_productivity,
aes(x = est_resist, y = est_late_recov, colour = as.factor(Heat))) +
facet_grid(~protegens) +
geom_hline(yintercept = 0, colour="grey") +
geom_vline(xintercept = 0, colour="grey") +
geom_abline(slope = 1) +
geom_point(shape=21, size=3, aes(fill=as.factor(late_recov_VS_resist))) +
scale_colour_viridis_d(option = "plasma", begin=0.2, end = 0.9) +
scale_fill_manual(values=c("white", "black")) +
geom_ellipse(aes(x0 = est_resist,
y0 = est_late_recov,
# radius on x direction:
a = hiCI_resist - est_resist,
# radius on y direction:
b = hiCI_late_recov - est_late_recov,
angle = 0)) +
labs(title = "Decoupling of productivity (NO extinct reps)",
x = "Resistance +/- 95% CI",
y = "Late Recovery +/- 95% CI",
colour = "Heat\nDuration",
fill="Resistance\nvs. Recovery\nSignificantly\nDifferent?")
# finally estimate decoupling by getting the distance to the y=x line
# calculate decoupling between resistance and early recovery
early_decoupling <- t(with(decoupling_productivity,
mapply(estimate_decoupling,
resist_est = est_resist,
resist_hiCI = hiCI_resist,
recov_est = est_early_recov,
recov_hiCI = hiCI_early_recov)))
# add annotation
early_decoupling <- cbind(decoupling_productivity[,1:2],
early_decoupling)
ggplot(early_decoupling,
aes(x = as.factor(Heat), y = est_decoupling)) +
facet_grid(~protegens) +
geom_hline(yintercept = 0, colour = "grey") +
geom_point(position = position_dodge(width = 0.5)) +
geom_errorbar(position = position_dodge(width = 0.5),
aes(ymin = loCI_decoupling, ymax = hiCI_decoupling),
alpha=0.4, width=0.1) +
labs(title = "Early recovery (NO extinct reps)",
y = "Decoupling +/- 95% CI",
x = "Heat Duration (hrs)")
# calculate decoupling between resistance and late recovery
late_decoupling <- t(with(decoupling_productivity,
mapply(estimate_decoupling,
resist_est = est_resist,
resist_hiCI = hiCI_resist,
recov_est = est_late_recov,
recov_hiCI = hiCI_late_recov)))
# add annotation
late_decoupling <- cbind(decoupling_productivity[,1:2],
late_decoupling)
ggplot(late_decoupling,
aes(x = as.factor(Heat), y = est_decoupling)) +
facet_grid(~protegens) +
geom_hline(yintercept = 0, colour = "grey") +
geom_point(position = position_dodge(width = 0.5)) +
geom_errorbar(position = position_dodge(width = 0.5),
aes(ymin = loCI_decoupling, ymax = hiCI_decoupling),
alpha=0.4, width=0.1) +
scale_colour_viridis_d(option = "viridis", end=0.85) +
labs(title = "Late recovery (NO extinct reps)",
y = "Decoupling +/- 95% CI",
x = "Heat Duration (hrs)")
##############################
# effect sizes with protegens as non-focal
##############################
# But we are not interested in the details of protegens. Let's do the post-hoc again now averaging across the effects of protegens.
posthoc_6h <- emmeans(effect_6h_protegens, pairwise ~ Trtmt_Day, data = absDen_6h)
## NOTE: Results may be misleading due to involvement in interactions
posthoc_12h <- emmeans(effect_12h_protegens, pairwise ~ Trtmt_Day, data = absDen_12h)
## NOTE: Results may be misleading due to involvement in interactions
posthoc_24h <- emmeans(effect_24h_protegens, pairwise ~ Trtmt_Day, data = absDen_24h)
## NOTE: Results may be misleading due to involvement in interactions
posthoc_48h <- emmeans(effect_48h_protegens, pairwise ~ Trtmt_Day, data = absDen_48h)
## NOTE: Results may be misleading due to involvement in interactions
# create a data.frame for plotting marginal effect sizes using a forest plot with the group labels
productivity_effects <- data.frame()
productivity_effects <- rbind(productivity_effects,
get_posthoc_NOprot(posthoc_6h, heat_trtmt = 6),
get_posthoc_NOprot(posthoc_12h, heat_trtmt = 12),
get_posthoc_NOprot(posthoc_24h, heat_trtmt = 24),
get_posthoc_NOprot(posthoc_48h, heat_trtmt = 48))
## note that for the decoupling plots I am using a Bonferroni-corrected alpha.
# let's get those confidence intervals because we will need them to plot decoupling below:
posthoc_6h_WIDER <- emmeans(effect_6h_protegens, pairwise ~ Trtmt_Day, data = absDen_6h, level=0.9875)
## NOTE: Results may be misleading due to involvement in interactions
posthoc_12h_WIDER <- emmeans(effect_12h_protegens, pairwise ~ Trtmt_Day, data = absDen_12h, level=0.9875)
## NOTE: Results may be misleading due to involvement in interactions
posthoc_24h_WIDER <- emmeans(effect_24h_protegens, pairwise ~ Trtmt_Day, data = absDen_24h, level=0.9875)
## NOTE: Results may be misleading due to involvement in interactions
posthoc_48h_WIDER <- emmeans(effect_48h_protegens, pairwise ~ Trtmt_Day, data = absDen_48h, level=0.9875)
## NOTE: Results may be misleading due to involvement in interactions
# put these wider CIs into a table
widerCIs <- data.frame()
widerCIs <- rbind(widerCIs,
get_posthoc_NOprot(posthoc_6h_WIDER, heat_trtmt = 6),
get_posthoc_NOprot(posthoc_12h_WIDER, heat_trtmt = 12),
get_posthoc_NOprot(posthoc_24h_WIDER, heat_trtmt = 24),
get_posthoc_NOprot(posthoc_48h_WIDER, heat_trtmt = 48))
# rename the columns to remind us that this is the Bonferroni corrected alpha
colnames(widerCIs)[4:5] <- c("loCI_bonAlpha", "hiCI_bonAlpha")
# combine the wider CIs with the effect sizes
productivity_effects <- inner_join(productivity_effects,
widerCIs %>% select(-SE))
## Joining with `by = join_by(Trtmt_Day, est, groups, Heat)`
rm(widerCIs)
# re-order the levels of Trtmt_Day to go from resistance to recovery then rename them for nice plotting
productivity_effects$Trtmt_Day <- factor(productivity_effects$Trtmt_Day,
levels = c("resist", "recov_1", "recov_2"))
levels(productivity_effects$Trtmt_Day) <- c("Resistance", "Early Recovery", "Late Recovery")
# plot with group labels
ggplot(productivity_effects,
aes(x = est, y = as.factor(Heat), colour = Trtmt_Day)) +
geom_vline(xintercept = 0, colour="darkgrey") +
geom_point(position = position_dodge(width = 0.5)) +
geom_errorbarh(position = position_dodge(width = 0.5),
aes(xmin = loCI, xmax = hiCI), height = 0.1) +
geom_text(position = position_dodge(width = 0.5),
aes(x=-0.008, label=groups)) +
scale_colour_manual(values=trtmt_pal) +
labs(x = "Effect Size on Total Density",
y = "Heat duration (hrs)",
title = "protegens as non-focal predictor (i.e., marginalized)")
#######
# finally, we will do a series of pairwise two-tailed t-tests to compare between heat durations
#######
# estimate the sample sizes
temp <- productivity_effects # copy the effects to temp
productivity_effects <- rbind(temp %>% filter(Heat == 6) %>% mutate(n = estimate_n(absDen_6h)),
temp %>% filter(Heat == 12) %>% mutate(n = estimate_n(absDen_12h)),
temp %>% filter(Heat == 24) %>% mutate(n = estimate_n(absDen_24h)),
temp %>% filter(Heat == 48) %>% mutate(n = estimate_n(absDen_48h)))
rm(temp)
# estimate the SD from the SE
productivity_effects <- productivity_effects %>% mutate(SD = SE * sqrt(n)) %>%
# re-order by Heat and Trtmt_Day
arrange(Heat, Trtmt_Day)
# all pairwise combinations of comparisons between the same treatment day for different durations
temp <- t(combn(c(1,4,7,10), 2))
combos <- rbind(temp, temp+1, temp+2)
rm(temp)
# loop through all the combinations and do the t-tests
prodEffects_ttests <- data.frame()
for(i in 1:nrow(combos)){
prodEffects_ttests <- rbind(prodEffects_ttests,
run_ttest(row_x = combos[i,1],
row_y = combos[i,2],
summary_stats_df = productivity_effects))
}
prodEffects_ttests$adjusted_p <- p.adjust(prodEffects_ttests$pvalue, method = "bonferroni")
prodEffects_ttests$Trtmt_Day <- productivity_effects$Trtmt_Day[combos[,1]]
prodEffects_ttests$Heat_1 <- productivity_effects$Heat[combos[,1]]
prodEffects_ttests$Heat_2 <- productivity_effects$Heat[combos[,2]]
print(prodEffects_ttests)
## t_statistic df pvalue adjusted_p Trtmt_Day Heat_1
## t 10.47327399 30.75149 1.154061e-11 2.077310e-10 Resistance 6
## t1 27.52904820 32.45710 4.475729e-24 8.056312e-23 Resistance 6
## t2 30.95878975 11.06125 4.243667e-12 7.638601e-11 Resistance 6
## t3 14.44861398 27.96645 1.694667e-14 3.050400e-13 Resistance 12
## t4 28.99508330 11.15798 7.358535e-12 1.324536e-10 Resistance 12
## t5 26.59511042 10.99768 2.476558e-11 4.457805e-10 Resistance 24
## t6 -0.07249429 32.00521 9.426597e-01 1.000000e+00 Early Recovery 6
## t7 13.22879424 31.94888 1.654272e-14 2.977690e-13 Early Recovery 6
## t8 1.84987737 16.06394 8.281448e-02 1.000000e+00 Early Recovery 6
## t9 13.29354339 28.39949 1.019381e-13 1.834887e-12 Early Recovery 12
## t10 1.89468118 15.97684 7.638020e-02 1.000000e+00 Early Recovery 12
## t11 -5.90091191 14.21475 3.628640e-05 6.531553e-04 Early Recovery 24
## t12 14.19702257 31.87762 2.481735e-15 4.467122e-14 Late Recovery 6
## t13 5.97944421 27.94791 1.948392e-06 3.507105e-05 Late Recovery 6
## t14 -6.79305127 15.70073 4.770715e-06 8.587286e-05 Late Recovery 6
## t15 -11.29482419 23.70829 5.042086e-11 9.075755e-10 Late Recovery 12
## t16 -15.47112543 15.82356 5.674596e-11 1.021427e-09 Late Recovery 12
## t17 -10.43356335 12.48399 1.585933e-07 2.854680e-06 Late Recovery 24
## Heat_2
## t 12
## t1 24
## t2 48
## t3 24
## t4 48
## t5 48
## t6 12
## t7 24
## t8 48
## t9 24
## t10 48
## t11 48
## t12 12
## t13 24
## t14 48
## t15 24
## t16 48
## t17 48
# these p-values seem overly optimistic. Use alpha = 1*10^-3
################################
# Plot figure for main text: Figure 4b
################################
png(filename="./figures/Fig4B_plot.png", width = 4.48, height = 2.61, units = "in", res=300)
ggplot(productivity_effects,
aes(x = est, y = as.factor(Heat), colour = Trtmt_Day)) +
geom_vline(xintercept = 0, colour="darkgrey") +
geom_point(position = position_dodge(width = 0.5)) +
geom_errorbarh(position = position_dodge(width = 0.5),
aes(xmin = loCI, xmax = hiCI), height = 0.15) +
scale_colour_manual(values=trtmt_pal) +
labs(x = "Effect Size on Productivity",
y="Heat Duration (hrs)",
colour = "Treatment\nDay") +
theme(legend.position="none")
dev.off()
## png
## 2
decoupling_productivity <- productivity_effects %>% select(-loCI, -hiCI)
# keep just the Bonferroni-corrected (wider) confidence intervals
decoupling_productivity <- decoupling_productivity %>%
rename(loCI = loCI_bonAlpha,
hiCI = hiCI_bonAlpha)
# for easier coding, rename the levels of Trtmt_Day
levels(decoupling_productivity$Trtmt_Day) <- c("resist", "early_recov", "late_recov")
# create data.frame for plotting
decoupling_productivity <- decoupling_productivity %>%
select(-n, -SD) %>%
pivot_wider(names_from = Trtmt_Day,
values_from = c(est, loCI, hiCI, SE, groups))
# columns that indicate if resistance is significantly different from recovery
decoupling_productivity$early_recov_VS_resist <- mapply(are_groups_different,
decoupling_productivity$groups_early_recov,
decoupling_productivity$groups_resist)
decoupling_productivity$late_recov_VS_resist <- mapply(are_groups_different,
decoupling_productivity$groups_late_recov,
decoupling_productivity$groups_resist)
# clean up extra columns
decoupling_productivity <- decoupling_productivity %>% select(-groups_resist, -groups_early_recov, -groups_late_recov)
# first plot the decoupling on early recovery
ggplot(decoupling_productivity,
aes(x = est_resist, y = est_early_recov, colour = as.factor(Heat))) +
#facet_grid(~CommRich) +
geom_hline(yintercept = 0, colour="grey") +
geom_vline(xintercept = 0, colour="grey") +
geom_abline(slope = 1) +
geom_point(shape=21, size=3, aes(fill=as.factor(early_recov_VS_resist))) +
geom_errorbarh(aes(xmin = loCI_resist, xmax = hiCI_resist), height=0) +
geom_errorbar(aes(ymin = loCI_early_recov, ymax = hiCI_early_recov), width=0) +
# center the plot on 0,0:
scale_x_continuous(limits = c(-0.019, 0.019), expand = c(0, 0)) +
scale_y_continuous(limits = c(-0.0014, 0.0014), expand = c(0, 0)) +
scale_colour_viridis_d(option = "plasma", begin=0.2, end = 0.9) +
scale_fill_manual(values=c("white", "black")) +
labs(title = "Decoupling of productivity (NO extinct reps)",
x = "Resistance +/- 95% CI",
y = "Early Recovery +/- 95% CI",
colour = "Heat\nDuration",
fill="Resistance\nvs. Recovery\nSignificantly\nDifferent?")
# here's another way to plot it where the confidence intervals are shown as ellipses:
ggplot(decoupling_productivity,
aes(x = est_resist, y = est_early_recov, colour = as.factor(Heat))) +
geom_hline(yintercept = 0, colour="grey") +
geom_vline(xintercept = 0, colour="grey") +
geom_abline(slope = 1) +
geom_point(shape=21, size=3, aes(fill=as.factor(early_recov_VS_resist))) +
scale_colour_viridis_d(option = "plasma", begin=0.2, end = 0.9) +
scale_fill_manual(values=c("white", "black")) +
geom_ellipse(aes(x0 = est_resist,
y0 = est_early_recov,
# radius on x direction:
a = hiCI_resist - est_resist,
# radius on y direction:
b = hiCI_early_recov - est_early_recov,
angle = 0)) +
labs(title = "Decoupling of productivity (NO extinct reps)",
x = "Resistance +/- 95% CI",
y = "Early Recovery +/- 95% CI",
colour = "Heat\nDuration",
fill="Resistance\nvs. Recovery\nSignificantly\nDifferent?")
# next plot the decoupling on later recovery
fig5a <- ggplot(decoupling_productivity,
aes(x = est_resist, y = est_late_recov, colour = as.factor(Heat))) +
geom_hline(yintercept = 0, colour="grey") +
geom_vline(xintercept = 0, colour="grey") +
geom_abline(slope = 1) +
geom_point(shape=21, size=2, aes(fill=as.factor(late_recov_VS_resist))) +
geom_errorbarh(aes(xmin = loCI_resist, xmax = hiCI_resist), height=0) +
geom_errorbar(aes(ymin = loCI_late_recov, ymax = hiCI_late_recov), width=0) +
# center the plot on 0,0:
scale_x_continuous(limits = c(-0.019, 0.019), expand = c(0, 0)) +
scale_y_continuous(limits = c(-0.003, 0.003), expand = c(0, 0)) +
scale_colour_viridis_d(option = "plasma", begin=0.2, end = 0.9) +
scale_fill_manual(values=c("white", "black")) +
labs(x = "Resistance",
y = "Late Recovery",
colour = "Heat\nDuration",
fill="Resistance\nvs. Recovery\nSignificantly\nDifferent?")
print(fig5a + labs(title = "Decoupling of productivity (NO extinct reps)"))
################################
# Plot figure for main text: Figure 5a
################################
png(filename="./figures/Fig5A.png", width = 6.25, height = 3.68, units = "in", res=300)
print(fig5a)
dev.off()
## png
## 2
# late recovery with CI plotted as ellipses:
ggplot(decoupling_productivity,
aes(x = est_resist, y = est_late_recov, colour = as.factor(Heat))) +
geom_hline(yintercept = 0, colour="grey") +
geom_vline(xintercept = 0, colour="grey") +
geom_abline(slope = 1) +
geom_point(shape=21, size=3, aes(fill=as.factor(late_recov_VS_resist))) +
scale_colour_viridis_d(option = "plasma", begin=0.2, end = 0.9) +
scale_fill_manual(values=c("white", "black")) +
geom_ellipse(aes(x0 = est_resist,
y0 = est_late_recov,
# radius on x direction:
a = hiCI_resist - est_resist,
# radius on y direction:
b = hiCI_late_recov - est_late_recov,
angle = 0)) +
labs(title = "Decoupling of productivity (NO extinct reps)",
x = "Resistance +/- 95% CI",
y = "Late Recovery +/- 95% CI",
colour = "Heat\nDuration",
fill="Resistance\nvs. Recovery\nSignificantly\nDifferent?")
# finally estimate decoupling by getting the distance to the y=x line
# calculate decoupling between resistance and early recovery
early_decoupling <- t(with(decoupling_productivity,
mapply(estimate_decoupling,
resist_est = est_resist,
resist_hiCI = hiCI_resist,
recov_est = est_early_recov,
recov_hiCI = hiCI_early_recov)))
# add annotation
early_decoupling <- cbind(decoupling_productivity[,1:2],
early_decoupling)
ggplot(early_decoupling,
aes(x = as.factor(Heat), y = est_decoupling)) +
geom_hline(yintercept = 0, colour = "grey") +
geom_point(position = position_dodge(width = 0.5)) +
geom_errorbar(position = position_dodge(width = 0.5),
aes(ymin = loCI_decoupling, ymax = hiCI_decoupling),
alpha=0.4, width=0.1) +
labs(title = "Early recovery (NO extinct reps)",
y = "Decoupling +/- 95% CI",
x = "Heat Duration (hrs)")
# calculate decoupling between resistance and late recovery
late_decoupling <- t(with(decoupling_productivity,
mapply(estimate_decoupling,
resist_est = est_resist,
resist_hiCI = hiCI_resist,
recov_est = est_late_recov,
recov_hiCI = hiCI_late_recov)))
# add annotation
late_decoupling <- cbind(decoupling_productivity[,c(1:2, 15)],
late_decoupling)
# plot for main text:
fig5b <- ggplot(late_decoupling,
aes(x = as.factor(Heat), y = est_decoupling)) +
geom_hline(yintercept = 0, colour = "grey") +
geom_point(position = position_dodge(width = 0.5)) +
geom_errorbar(aes(ymin = loCI_decoupling, ymax = hiCI_decoupling),
width=0.05) +
labs(y = "Decoupling",
x = "Heat Duration (hrs)")
print(fig5b + labs(title = "Late recovery (NO extinct reps)"))
################################
# Plot figure for main text: Figure 5b
################################
png(filename="./figures/Fig5B.png", width = 4.7, height = 2.0, units = "in", res=300)
print(fig5b)
dev.off()
## png
## 2
# clean up
rm(absDen_6h, absDen_12h, absDen_24h, absDen_48h, absDen_mods6h, absDen_mods12h, absDen_mods24h, absDen_mods48h,
combos, decoupling_productivity, early_decoupling, late_decoupling,
effect_6h, effect_12h, effect_24h, effect_48h, effect_6h_protegens, effect_12h_protegens, effect_24h_protegens, effect_48h_protegens,
posthoc_6h, posthoc_12h, posthoc_24h, posthoc_48h, posthoc_6h_WIDER, posthoc_12h_WIDER, posthoc_24h_WIDER, posthoc_48h_WIDER, posthocPROT_6h, posthocPROT_12h, posthocPROT_24h, posthocPROT_48h, posthocPROT_6h_WIDER, posthocPROT_12h_WIDER, posthocPROT_24h_WIDER, posthocPROT_48h_WIDER,
prod_effects_protegens, prodEffects_ttests, productivity_effects, productivity_protegens,
fig5a, fig5b)
## Warning in rm(absDen_6h, absDen_12h, absDen_24h, absDen_48h, absDen_mods6h, :
## object 'effect_6h' not found
## Warning in rm(absDen_6h, absDen_12h, absDen_24h, absDen_48h, absDen_mods6h, :
## object 'effect_12h' not found
## Warning in rm(absDen_6h, absDen_12h, absDen_24h, absDen_48h, absDen_mods6h, :
## object 'effect_24h' not found
## Warning in rm(absDen_6h, absDen_12h, absDen_24h, absDen_48h, absDen_mods6h, :
## object 'effect_48h' not found